“Spatio-Temporal Video Analysis and Visual Activity Recognition” at the Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA) 2011 Conference in Las Palmas de Gran Canaria. Spain. June 8-10.
My research group is focused on a variety of approaches for (a) low-level video analysis and synthesis and (b) recognizing activities in videos. In this talk, I will concentrate on two of our recent efforts. One effort aimed at robust spatio-temporal segmentation of video and another on using motion and flow to recognize and predict actions from video.
In the first part of the talk, I will present an efficient and scalable technique for spatio-temporal segmentation of long video sequences using a hierarchical graph-based algorithm. In this work, we begin by over segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a “region graph” over the obtained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach generates high quality segmentations, which are temporally coherent with stable region boundaries, and allows subsequent applications to choose from varying levels of granularity. We further improve segmentation quality by using dense optical flow to guide temporal connections in the initial graph. I will demonstrate a variety of examples of how this robust segmentation works, and will show additional examples of video-retargeting that use spatio-temporal saliency derived from this segmentation approach. (Matthias Grundmann, Vivek Kwatra, Mei Han, Irfan Essa, CVPR 2010, in collaboration with Google Research).
In the second part of this talk, I will show that constrained multi-agent events can be analyzed and even predicted from video. Such analysis requires estimating the global movements of all players in the scene at any time, and is needed for modeling and predicting how the multi-agent play evolves over time on the playing field. To this end, we propose a novel approach to detect the locations of where the play evolution will proceed, e.g. where interesting events will occur, by tracking player positions and movements over time. To achieve this, we extract the ground level sparse movement of players in each time-step, and then generate a dense motion field. Using this field we detect locations where the motion converges, implying positions towards which the play is evolving. I will show examples of how we have tested this approach for soccer, basketball and hockey. (Kihwan Kim, Matthias Grundmann, Ariel Shamir, Iain Matthews, Jessica Hodgins, Irfan Essa, CVPR 2010, in collaboration with Disney Research).
Time permitting, I will show some more videos of our recent work on video analysis and synthesis. For more information, papers, and videos, see my website.