Paper in UIST 2021 on “Automatic Instructional Video Creation from a Markdown-formatted Tutorial”

Abstract

We introduce HowToCut, an automatic approach that converts a Markdown-formatted tutorial into an interactive video presenting visual instructions with a synthesized voiceover for narration. HowToCut extracts instructional content from a multimedia document that describes a step-by-step procedure. Our method selects and converts text instructions to a voiceover. It makes automatic editing decisions to align the narration with edited visual assets, including step images, videos, and text overlays. We derive our video editing strategies from analyzing 125 web tutorials and applying Computer Vision techniques to the assets. HowToCut’s conversational UI presents instructions in multiple formats upon user commands to enable viewers to navigate the tutorial interactively. We evaluated our automatically-generated video tutorials through user studies (N=20) and validated the video quality via an online survey (N=93). The evaluation shows that our method effectively created informative and helpful instructional videos from a web tutorial document for both reviewing and following.