Abstract We introduce HowToCut, an automatic approach that converts a Markdown-formatted tutorial into an interactive video that presents the visual instructions with a synthesized voiceover for narration. HowToCut extracts instructional content from a multimedia document that describes a step-by-step procedure. Our method selects and converts text instructions to a voiceover. It makes automatic editing decisions […]
Paper in ACM CHI 2021 on “Automatic Generation of Two-Level Hierarchical Tutorials from Instructional Makeup Videos”
We present a multi-modal approach for automatically generating hierarchical tutorials from instructional makeup videos. Our approach is inspired by prior research in cognitive psychology, which suggests that people mentally segment procedural tasks into event hierarchies, where coarse-grained events focus on objects while fine-grained events focus on actions. In the instructional makeup domain, we find that objects correspond to facial parts while fine-grained steps correspond to actions on those facial parts. Given an input instructional makeup video, we apply a set of heuristics that combine computer vision techniques with transcript text analysis to automatically identify the fine-level action steps and group these steps by facial part to form the coarse-level events. We provide a voice-enabled, mixed-media UI to visualize the resulting hierarchy and allow users to efficiently navigate the tutorial (e.g., skip ahead, return to previous steps) at their own pace. Users can navigate the hierarchy at both the facial-part and action-step levels using click-based interactions and voice commands. We demonstrate the effectiveness of segmentation algorithms and the resulting mixed-media UI on a variety of input makeup videos. A user study shows that users prefer following instructional makeup videos in our mixed-media format to the standard video UI and that they find our format much easier to navigate.
We now have Google Research, based right here in Atlanta (Google Research, Atlanta) and we are hiring in computer vision, machine learning, artificial intelligence, and human-computer interaction, with a specific focus on content/video understanding and creation. Here’s a bit more info for folks who are interested I am establishing a research and advanced development team […]
Paper in AAAI 2021 on “Semantic MapNet: Building Allocentric Semantic Maps and Representations from Egocentric Views”
We study the task of semantic mapping – specifically, an embodied agent (a robot or an egocentric AI assistant) is given a tour of a new environment and asked to build an allocentric top-down semantic map (‘what is where?’) from egocentric observations of an RGB-D camera with known pose (via localization sensors). Importantly, our goal is to build neural episodic memories and spatio-semantic representations of 3D spaces that enable the agent to easily learn subsequent tasks in the same space – navigating to objects seen during the tour (‘Find chair’) or answering questions about the space (‘How many chairs did you see in the house?’).
Creating marketing videos from scratch can be challenging, especially when designing for multiple platforms with different viewing criteria. We present URL2Video, an automatic approach that converts a web page into a short video given temporal and visual constraints. URL2Video captures quality materials and design styles extracted from a web page, including fonts, colors, and layouts. Using constraint programming, URL2Video’s design engine organizes the visual assets into a sequence of shots and renders to a video with a user-specified aspect ratio and duration. Creators can review the video composition, modify constraints, and generate video variation through a user interface. We learned the design process from designers and compared our automatically generated results with their creation through interviews and an online survey. The evaluation shows that URL2Video effectively extracted design elements from a web page and supported designers by bootstrapping the video creation process.
Graphic design is essential for visual communication with layouts being fundamental to composing attractive designs. Layout generation differs from pixel-level image synthesis and is unique in terms of the requirement of mutual relations among the desired components. We propose a method for design layout generation that can satisfy user-specified constraints.
Honored to have been asked to moderate a panel of ML@GT researchers who stepped up to respond to the COVID-19 crisis. See the video of the panel below. The coronavirus (Covid-19) pandemic has wreaked havoc on the world, spurring researchers across disciplines into action to help human-kind. Four researchers affiliated with the Machine Learning Center at […]
Honored to have been invited to speak at the inaugural workshop at CVPR 2020 on “AI for Content Creation.” As CVPR 2020 went online, so did this workshop. I gave a talk on “AI (CV/ML) for Content Creation”. More information on the workshop is The AI for Content Creation workshop (AICCW) at CVPR 2020 brings […]
Keynote Speaker at CNS/AANS Spine Summit 2020, Las Vegas, Nevada, March 6, 2020, on the topic of “Data-driven Innovation”
Honored to be invited as the keynote/guest speaker at the Spine Summit 2020, The 36th Annual Meeting of the American Association of Neurological Surgeons (AANS) and the Congress of Neurological Surgeons (CNS) on March 6, 2020, at the Cosmopolitan of Las Vegas, in Las Vegas, Nevada, USA. Here is the full program of the conference. […]