“…To help video consumers skim and navigate to content of interest, prior work introduced approaches to navigate videos based on transcripts [33,54,55], high-level chapters and scenes [13,19,34,54,56,80,84], or key objects and concepts [12,44,59]. While transcripts help users efficiently search for words used in the video [33,54,55], they can be difficult to skim as they are often long, unstructured, and contain disfluencies present in speech [56].…”