Paper in ACM CHI 2021 on “Automatic Generation of Two-Level Hierarchical Tutorials from Instructional Makeup Videos”

Abstract

We present a multi-modal approach for automatically generating hierarchical tutorials from instructional makeup videos. Our approach is inspired by prior research in cognitive psychology, which suggests that people mentally segment procedural tasks into event hierarchies, where coarse-grained events focus on objects while fine-grained events focus on actions. In the instructional makeup domain, we find that objects correspond to facial parts while fine-grained steps correspond to actions on those facial parts. Given an input instructional makeup video, we apply a set of heuristics that combine computer vision techniques with transcript text analysis to identify the fine-level action steps automatically and group these steps by facial part to form the coarse-level events. We provide a voice-enabled, mixed-media UI to visualize the resulting hierarchy and allow users to efficiently navigate the tutorial (e.g., skip ahead, return to previous steps) at their own pace. Users can navigate the hierarchy at both the facial-part and action-step levels using click-based interactions and voice commands. We demonstrate the effectiveness of segmentation algorithms and the resulting mixed-media UI on a variety of input makeup videos. A user study shows that users prefer following instructional makeup videos in our mixed-media format to the standard video UI and that they find our format much easier to navigate.

Paper / Citation

Anh Truong, Peggy Chi, David Salesin, Irfan Essa, Maneesh Agrawala

Automatic Generation of Two-Level Hierarchical Tutorials from Instructional Makeup Videos Proceedings Article

In: ACM CHI Conference on Human factors in Computing Systems, 2021.

Abstract | Links | BibTeX | Tags: CHI, computational video, google, human-computer interaction, video summarization

@inproceedings{2021-Truong-AGTHTFIMV,

title = {Automatic Generation of Two-Level Hierarchical Tutorials from Instructional Makeup Videos},

author = {Anh Truong and Peggy Chi and David Salesin and Irfan Essa and Maneesh Agrawala},

url = {https://dl.acm.org/doi/10.1145/3411764.3445721

https://research.google/pubs/pub50007/

http://anhtruong.org/makeup_breakdown/},

doi = {10.1145/3411764.3445721},

year  = {2021},

date = {2021-05-01},

urldate = {2021-05-01},

booktitle = {ACM CHI Conference on Human factors in Computing Systems},

abstract = {We present a multi-modal approach for automatically generating hierarchical tutorials from instructional makeup videos. Our approach is inspired by prior research in cognitive psychology, which suggests that people mentally segment procedural tasks into event hierarchies, where coarse-grained events focus on objects while fine-grained events focus on actions. In the instructional makeup domain, we find that objects correspond to facial parts while fine-grained steps correspond to actions on those facial parts. Given an input instructional makeup video, we apply a set of heuristics that combine computer vision techniques with transcript text analysis to automatically identify the fine-level action steps and group these steps by facial part to form the coarse-level events. We provide a voice-enabled, mixed-media UI to visualize the resulting hierarchy and allow users to efficiently navigate the tutorial (e.g., skip ahead, return to previous steps) at their own pace. Users can navigate the hierarchy at both the facial-part and action-step levels using click-based interactions and voice commands. We demonstrate the effectiveness of segmentation algorithms and the resulting mixed-media UI on a variety of input makeup videos. A user study shows that users prefer following instructional makeup videos in our mixed-media format to the standard video UI and that they find our format much easier to navigate.},

keywords = {CHI, computational video, google, human-computer interaction, video summarization},

pubstate = {published},

tppubtype = {inproceedings}

}

Close