Video-Annotated Augmented Reality Assembly Tutorials

User Interface Software and Technology (UIST 2020)

Masahiro Yamaguchi
Keio University
Graz University of Technology
Shohei Mori
Graz University of Technology
Peter Mohr
Graz University of Technology
VRVis GmbH
Markus Tatzgern
Salzburg University of Applied Sciences
Ana Stanescu
Graz University of Technology
Hideo Saito
Keio University
Denis Kalkofen
Graz University of Technology
Teaser image


We present a system for generating and visualizing interactive 3D Augmented Reality tutorials based on 2D video input, which allows viewpoint control at runtime. Inspired by assembly planning, we analyze the input video using a 3D CAD model of the object to determine an assembly graph that encodes blocking relationships between parts. Using an assembly graph enables us to detect assembly steps that are otherwise difficult to extract from the video, and generally improves object detection and tracking by providing prior knowledge about movable parts. To avoid information loss, we combine the 3D animation with relevant parts of the 2D video so that we can show detailed manipulations and tool usage that cannot be easily extracted from the video. To further support user orientation, we visually align the 3D animation with the real-world object by using texture information from the input video. We developed a presentation system that uses commonly available hardware to make our results accessible for home use and demonstrate the effectiveness of our approach by comparing it to traditional video tutorials.


Video-annotated AR Tutorial

  • AR mirror as a 3D interactive tutorial
  • Ordered and animated 3D parts
  • Accompanying video annotations for undetectable detailed instructions


  1. Prepare a video recording of an assembly and a 3D CAD model
  2. Run our algorithm to extract an assembly graph, assembly sequence, part's animations, and accompanying video annotations from the input data
  3. Display the ordered 3D animations and video annotations on an AR mirror

Design Recommendations

  • Step-by-step animated instructions: Let the user select a speed of each instruction
  • Free-viewpoint instructions: Allow the user to align the real-world objects with the virtual ones
  • Video annotated AR instructions: Salvage hard-to-extract visual and motion cues
  • Video texturing: Align the appearance of virtual and real objects to facilitate better visual search


	author={Yamaguchi, Masahiro and Mori, Shohei and Mohr, Peter and Tatzgern, Markus and Stanescu, Ana and Saito, Hideo and Kalkofen, Denis},
	booktitle = {ACM Symposium on User Interface Software and Technology}, 
	title={Video-Annotated Augmented Reality Assembly Tutorials},