This project will utilize the Carnegie Mellon University Multi Modal Activity (CMU-MMAC) data set as a testbed for developing automatic video alignment algorithms. I will use the available data to create an algorithm, which will temporally align videos of several people performing the same action, but at a different rate and with a different style.
The project will involve finding robust vision features that efficiently encode the similarity between two video frames. Furthermore, existing algorithms for alignment of signals will be explored and modified to perform alignment of video frames.
The CMU-MMAC data set is very challenging because subjects were asked to perform kitchen activities as naturally as they would perform them in their home. Due to the high variability across people in executing the same action, it is essential that activity recognition programs utilize algorithms for video alignment. The size of the data set requires that such algorithms are automatic, as it would be infeasible to perform the alignment by hand.