Matthias Grundmann, Vivek Kwatra, Mei Han, Irfan Essa (2010) “Efficient Hierarchical Graph-Based Video Segmentation” in Proceedings of IEEE Computer Vision and Pattern Recognition Conference (CVPR), San Francisco, CA, USA, June 2010 [PDF][Website][DOI][Video (Youtube)].
We present an efficient and scalable technique for spatiotemporal segmentation of long video sequences using a hierarchical graph-based algorithm. We begin by over-segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a “region graph” over the obtained segmentation and iteratively repeat this process over multiple levels to create a tree of spatiotemporal segmentations. This hierarchical approach generates high-quality segmentations, which are temporally coherent with stable region boundaries and allows subsequent applications to choose from varying levels of granularity. We further improve segmentation quality by using dense optical flow to guide temporal connections in the initial graph.
We also propose two novel approaches to improve the scalability of our technique: (a) a parallel out-of-core algorithm that can process volumes much larger than an in-core algorithm and (b) a clip-based processing algorithm that divides the video into overlapping clips in time and segments them successively while enforcing consistency.
We demonstrate hierarchical segmentations on video shots as long as 40 seconds, and even support a streaming mode for arbitrarily long videos, albeit without the ability to process them hierarchically.