2018
DOI: 10.1609/aaai.v32i1.12253
|View full text |Cite
|
Sign up to set email alerts
|

Movie Question Answering: Remembering the Textual Cues for Layered Visual Contents

Abstract: Movies provide us with a mass of visual content as well as attracting stories. Existing methods have illustrated that understanding movie stories through only visual content is still a hard problem. In this paper, for answering questions about movies, we put forward a Layered Memory Network (LMN) that represents frame-level and clip-level movie content by the Static Word Memory module and the Dynamic Subtitle Memory module, respectively. Particularly, we firstly extract words and sentences from the training mo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 26 publications
(1 citation statement)
references
References 23 publications
0
1
0
Order By: Relevance
“…Neither of these tasks currently involves human cognitive reasoning within ToM. Factoid questions directly inquire about visual facts, such as locations or colors (Wang et al 2018;Lei et al 2018;Maharaj et al 2017). On the other hand, inference VideoQA explores the logic within dynamic scenarios (Xiao et al 2021;Yi et al 2019;Mao et al 2022).…”
Section: Related Workmentioning
confidence: 99%
“…Neither of these tasks currently involves human cognitive reasoning within ToM. Factoid questions directly inquire about visual facts, such as locations or colors (Wang et al 2018;Lei et al 2018;Maharaj et al 2017). On the other hand, inference VideoQA explores the logic within dynamic scenarios (Xiao et al 2021;Yi et al 2019;Mao et al 2022).…”
Section: Related Workmentioning
confidence: 99%