“…Grounded in NLP, computational narrative understanding has largely prioritized written narratives for understandable reasons. However, such textdriven approaches leave out large portions of storytelling behavior, including movies and television (Arnold et al, 2019;Papalampidi et al, 2019), usergenerated streaming content, illustrated content in comic strips (Edlin and Reiss, 2023), graphic novels, or children's books (Adukia et al, 2021), and finally video games, which might have stronger or weaker narrative structures. While textual narratives are largely unimodal in nature (though the physical and visual dimensions of books has been a vibrant area of study for a long time (Collective, 2019)), these other narrative forms are all crucially multi-modal in nature.…”