Link to CALCM Home  

Implementation of Lucas-Kanade's Optical Flow Algorithm on Graphics Processors using NVIDIA CUDA

Tuesday September 25, 2007
Hamerschlag Hall D-210
4:30 pm



Kevin Biswas
Carnegie Mellon University

General purpose computing with programmable graphics processing units (GPUs) exhibits great potential to solve a wide variety of numerically intensive problems significantly faster than conventional CPUs due to the GPU's massively-parallel architecture and superior memory bandwidth. The primary challenge, however, has been to overcome the constraints of working within a graphics programming environment and harness the available computational power from the GPU. In early 2007, NVIDIA released its Compute Unified Device Architecture (CUDA) platform, a co-designed hardware and software architecture for issuing and managing computations on the G80 GPU without the need of mapping them to a graphics API.

As the programmability and associated programming tools for GPUs are improving, more attention is being paid towards accelerating compute-intensive algorithms in computer vision. One of the most intensive of such algorithms is optical flow, a method used to estimate motion of objects within a visual representation. The technique is often used in real-time applications such as robot navigation and collision avoidance, but generally limited to low frame rates and small image sizes since these systems are based on CPU implementations. The Lucas-Kanade method for optical flow is a popular version of the algorithm which calculates motion between two successive image frames in time at every pixel position. The algorithm is unique in that its structure is non-iterative. This characteristic, along with its inherent data-parallelism, make it particularly attractive for real-time implementation on GPUs.

In this talk, we first describe the NVIDIA CUDA computing framework and then go on to illustrate that the per-pixel nature of the Lucas-Kanade optical flow algorithm allows for an efficient mapping to the multi-threaded architecture of the G80 GPU. To achieve optimum performance, images are divided into sub-frames and a tiled-computation method is used to efficiently utilize the memory bandwidth. In addition, threads are organized such that multi-processor occupancy is maximized to avoid any unnecessary hardware idling. For large image sizes we ultimately achieve a speedup of 397X with respect to the CPU implementation.


Kevin Biswas is a Ph.D. student advised by Professor Ken Mai in Electrical and Computer Engineering at Carnegie Mellon University. He received the B.A.Sc. and M.A.Sc. degrees in Electrical and Computer Engineering from the University of Windsor. His primary research interest is in the area of computer architecture.

 

Department of Electrical and Computer EngineeringCarnegie Mellon UniversitySchool of Computer Science