October 12, 2021 / Last updated : March 15, 2023 irfan UIST

Paper in UIST 2021 on “Automatic Instructional Video Creation from a Markdown-formatted Tutorial”

Abstract We introduce HowToCut, an automatic approach that converts a Markdown-formatted tutorial into an interactive video presenting visual instructions with a synthesized voiceover for narration. HowToCut extracts instructional content from a multimedia document that describes a step-by-step procedure. Our method selects and converts text instructions to a voiceover. It makes automatic editing decisions to align […]

June 1, 2021 / Last updated : March 25, 2023 irfan IMWUT

Paper in IMWUT 2021 on “Contrastive Predictive Coding for Human Activity Recognition”

Feature extraction is crucial for human activity recognition (HAR) using body-worn movement sensors. Recently, learned representations have been used successfully, offering promising alternatives to manually engineered features. Our work focuses on effective use of small amounts of labeled data and the opportunistic exploitation of unlabeled data that are straightforward to collect in mobile and ubiquitous computing scenarios. …

February 25, 2021 / Last updated : March 15, 2023 irfan CHI

Paper in ACM CHI 2021 on “Automatic Generation of Two-Level Hierarchical Tutorials from Instructional Makeup Videos”

We present a multi-modal approach for automatically generating hierarchical tutorials from instructional makeup videos. Our approach is inspired by prior research in cognitive psychology, which suggests that people mentally segment procedural tasks into event hierarchies, where coarse-grained events focus on objects while fine-grained events focus on actions. In the instructional makeup domain, we find that objects correspond to facial parts while fine-grained steps correspond to actions on those facial parts. Given an input instructional makeup video, we apply a set of heuristics that combine computer vision techniques with transcript text analysis to automatically identify the fine-level action steps and group these steps by facial part to form the coarse-level events. We provide a voice-enabled, mixed-media UI to visualize the resulting hierarchy and allow users to efficiently navigate the tutorial (e.g., skip ahead, return to previous steps) at their own pace. Users can navigate the hierarchy at both the facial-part and action-step levels using click-based interactions and voice commands. We demonstrate the effectiveness of segmentation algorithms and the resulting mixed-media UI on a variety of input makeup videos. A user study shows that users prefer following instructional makeup videos in our mixed-media format to the standard video UI and that they find our format much easier to navigate.

February 15, 2021 / Last updated : April 6, 2022 irfan Google

Research Opportunities at Google Atlanta

We now have Google Research, based right here in Atlanta (Google Research, Atlanta) and we are hiring in computer vision, machine learning, artificial intelligence, and human-computer interaction, with a specific focus on content/video understanding and creation. Here’s a bit more info for folks who are interested I am establishing a research and advanced development team […]

February 2, 2021 / Last updated : March 20, 2023 irfan AAAI

Paper in AAAI 2021 on “Semantic MapNet: Building Allocentric Semantic Maps and Representations from Egocentric Views”

We study the task of semantic mapping – specifically, an embodied agent (a robot or an egocentric AI assistant) is given a tour of a new environment and asked to build an allocentric top-down semantic map (‘what is where?’) from egocentric observations of an RGB-D camera with known pose (via localization sensors). Importantly, our goal is to build neural episodic memories and spatio-semantic representations of 3D spaces that enable the agent to easily learn subsequent tasks in the same space – navigating to objects seen during the tour (‘Find chair’) or answering questions about the space (‘How many chairs did you see in the house?’).

October 28, 2020 / Last updated : March 20, 2023 irfan UIST

Paper in ACM UIST 2020 on “Automatic Video Creation From a Web Page”

Creating marketing videos from scratch can be challenging, especially when designing for multiple platforms with different viewing criteria. We present URL2Video, an automatic approach that converts a web page into a short video given temporal and visual constraints. URL2Video captures quality materials and design styles extracted from a web page, including fonts, colors, and layouts. Using constraint programming, URL2Video’s design engine organizes the visual assets into a sequence of shots and renders to a video with a user-specified aspect ratio and duration. Creators can review the video composition, modify constraints, and generate video variation through a user interface. We learned the design process from designers and compared our automatically generated results with their creation through interviews and an online survey. The evaluation shows that URL2Video effectively extracted design elements from a web page and supported designers by bootstrapping the video creation process.

September 1, 2020 / Last updated : March 25, 2023 irfan ISWC

Paper in ISWC 2020 on “Masked reconstruction based self-supervision for human activity recognition”

The ubiquitous availability of wearable sensing devices has rendered large scale collection of movement data a straightforward endeavor. Yet, annotation of these data remains a challenge and as such, publicly available datasets for human activity recognition (HAR) are typically limited in size as well as in variability, which constrains HAR model training and effectiveness. We introduce ..

August 25, 2020 / Last updated : March 20, 2023 irfan ECCV

Paper in ECCV 2020 on “Neural Design Network: Graphic Layout Generation with Constraints”

Graphic design is essential for visual communication with layouts being fundamental to composing attractive designs. Layout generation differs from pixel-level image synthesis and is unique in terms of the requirement of mutual relations among the desired components. We propose a method for design layout generation that can satisfy user-specified constraints.

June 24, 2020 / Last updated : February 21, 2021 irfan Events

Panel of ML@GT Researchers working on Covid-19 Relief

Honored to have been asked to moderate a panel of ML@GT researchers who stepped up to respond to the COVID-19 crisis. See the video of the panel below. The coronavirus (Covid-19) pandemic has wreaked havoc on the world, spurring researchers across disciplines into action to help human-kind. Four researchers affiliated with the Machine Learning Center at […]

June 15, 2020 / Last updated : March 20, 2023 irfan CVPR

Invited Speaker at CVPR 2020 Workshop on “AI for Content Creation”

Honored to have been invited to speak at the inaugural workshop at CVPR 2020 on “AI for Content Creation.” As CVPR 2020 went online, so did this workshop. I gave a talk on “AI (CV/ML) for Content Creation”. More information on the workshop is The AI for Content Creation workshop (AICCW) at CVPR 2020 brings […]

Blog