Abstract We introduce HowToCut, an automatic approach that converts a Markdown-formatted tutorial into an interactive video that presents the visual instructions with a synthesized voiceover for narration. HowToCut extracts instructional content from a multimedia document that describes a step-by-step procedure. Our method selects and converts text instructions to a voiceover. It makes automatic editing decisions […]
Paper in ACM CHI 2021 on “Automatic Generation of Two-Level Hierarchical Tutorials from Instructional Makeup Videos”
We present a multi-modal approach for automatically generating hierarchical tutorials from instructional makeup videos. Our approach is inspired by prior research in cognitive psychology, which suggests that people mentally segment procedural tasks into event hierarchies, where coarse-grained events focus on objects while fine-grained events focus on actions. In the instructional makeup domain, we find that objects correspond to facial parts while fine-grained steps correspond to actions on those facial parts. Given an input instructional makeup video, we apply a set of heuristics that combine computer vision techniques with transcript text analysis to automatically identify the fine-level action steps and group these steps by facial part to form the coarse-level events. We provide a voice-enabled, mixed-media UI to visualize the resulting hierarchy and allow users to efficiently navigate the tutorial (e.g., skip ahead, return to previous steps) at their own pace. Users can navigate the hierarchy at both the facial-part and action-step levels using click-based interactions and voice commands. We demonstrate the effectiveness of segmentation algorithms and the resulting mixed-media UI on a variety of input makeup videos. A user study shows that users prefer following instructional makeup videos in our mixed-media format to the standard video UI and that they find our format much easier to navigate.
Paper Abstract The massive growth of sports videos has resulted in a need for automatic generation of sports highlights that are comparable in quality to the hand-edited highlights produced by broadcasters such as ESPN. Unlike previous works that mostly use audio-visual cues derived from the video, we propose an approach that additionally leverages contextual cues […]
Paper in Ubicomp 2015: "A Practical Approach for Recognizing Eating Moments with Wrist-Mounted Inertial Sensing"
Paper Abstract Recognizing when eating activities take place is one of the key challenges in automated food intake monitoring. Despite progress over the years, most proposed approaches have been largely impractical for everyday usage, requiring multiple onbody sensors or specialized devices such as neck collars for swallow detection. In this paper, we describe the implementation […]
Paper in ACM IUI15: “Inferring Meal Eating Activities in Real-World Settings from Ambient Sounds: A Feasibility Study”
Citation Abstract Dietary self-monitoring has been shown to be an effective method for weight-loss, but it remains an onerous task despite recent advances in food journaling systems. Semi-automated food journaling can reduce the effort of logging, but often requires that eating activities be detected automatically. In this work we describe results from a feasibility study […]
Paper in ACM Ubicomp 2013 "Technological approaches for addressing privacy concerns when recognizing eating behaviors with wearable cameras"
Abstract First-person point-of-view (FPPOV) images taken by wearable cameras can be used to better understand people’s eating habits. Human computation is a way to provide effective analysis of FPPOV images in cases where algorithmic approaches currently fail. However, privacy is a serious concern. We provide a framework, the privacy-saliency matrix, for understanding the balance between […]
Paper in ACM KDD 2013 “Detecting insider threats in a real corporate database of computer usage activity”
Abstract This paper reports on methods and results of an applied research project by a team consisting of SAIC and four universities to develop, integrate, and evaluate new approaches to detect the weak signals characteristic of insider threats on organizations’ information systems. Our system combines structural and semantic information from a real corporate database of […]
At ACM sponsored, 14th International Conference on Ubiquitous Computing (Ubicomp 2012), Pittsburgh, PA, September 5 – 7, 2012. Here are the highlights of my group’s participation in Ubicomp 2012. E. Thomaz, V. Bettadapura, Gabriel Reyes, M. Sandesh, G. Schindler, T. Ploetz, G. D. Abowd, and I. Essa (2012), “Recognizing Water-Based Activities in the Home Through Infrastructure-Mediated […]
Paper (2009) ACM CHI: "Videolyzer: Quality Analysis of Online Informational Video for Bloggers and Journalists"
N. Diakopoulos, S. Goldenberg, I. Essa (2009). “Videolyzer: Quality Analysis of Online Informational Video for Bloggers and Journalists.” ACM Conference on Human Factors in Computing Systems (CHI). April, 2009. [PDF] [Project Site] [Video] (CHI 2009 – Digital Life New World – CHI 2009 Advance Program) Abstract Tools to aid people in making sense of the information quality […]
Matthew Flagg, Atsushi Nakazawa, Qiushuang Zhang, Sing Bing Kang, Young Kee Ryu, Irfan Essa, James M. Rehg (2009), Human Video Textures In Proceedings of the ACM Symposium on Interactive 3D Graphics and Games 2009 (I3D ’09), Boston, MA, February 27-March 1 (Fri-Sun), 2009 [PDF (see Copyright) | Video in DiVx | Website ] Abstract This paper describes a data-driven approach […]