Paper in ECCV Workshop 2012: “Weakly Supervised Learning of Object Segmentations from Web-Scale Videos”

Paper / Citation

Glenn Hartmann, Matthias Grundmann, Judy Hoffman, David Tsai, Vivek Kwatra, Omid Madani, Sudheendra Vijayanarasimhan, Irfan Essa, James Rehg, Rahul Sukthankar

Weakly Supervised Learning of Object Segmentations from Web-Scale Videos Best Paper Proceedings Article

In: Proceedings of ECCV 2012 Workshop on Web-scale Vision and Social Media, 2012.

Abstract | Links | BibTeX | Tags: awards, best paper award, computer vision, ECCV, machine learning


We propose to learn pixel-level segmentations of objects from weakly labeled (tagged) internet videos. Espe cially, given a large collection of raw YouTube content, along with potentially noisy tags, our goal is to automatically generate spatiotemporal masks for each object, such as a “dog”, without employing any pre-trained object detectors. We formulate this problem as learning weakly supervised classi ers for a set of independent spatiotemporal segments. The object seeds obtained using segment-level classi ers are further re ned using graphcuts to generate high-precision object masks. Our results, obtained by training on a dataset of 20,000 YouTube videos weakly tagged into 15 classes, demonstrate the automatic extraction of pixel-level object masks. Evaluated against a ground-truthed subset of 50,000 frames with pixel-level annotations, we con Our proposed methods can learn good object masks just by watching YouTube.

More Information

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.