Video question answering via grounded cross-attention network learning

Ye, Yunan; Shi-feng, Zhang; Li, Yimeng; Qian, Xufeng; Tang, Siliang; Pu, Shiliang; Xiao, Jun

doi:10.1016/j.ipm.2020.102265

Cited by 16 publications

(12 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…What Who How When Where All YouTube2Text-QA result. In Table 5, we compare our methods with the state-of-the-art r-ANL [30] on YouTube2Text-QA dataset. It's worth mentioning that r-ANL utilized frame-level attributes as additional supervision to augment learning while our method does not.…”

Section: Methods Question Typementioning

confidence: 99%

“…0.262). We also report the perclass accuracy to make direct comparison with [30], and our method is better than r-ANL in this evaluation method.…”

Section: Methods Question Typementioning

confidence: 99%

“…VideoQA is considered to be a challenging problem as reasoning on video clip usually requires memorizing contextual information in temporal scale. Many models have been proposed to tackle this problem [5,10,27,[30][31][32]. Many work [5,10,30] utilized both motion (i.e.…”

Section: Related Workmentioning

confidence: 99%

“…VGG [22], ResNet [8]) features to better represent video frames. Similar to the spatial mechanism widely used in VQA methods to find relevant image regions, many VideoQA work [5,10,27,30] applied temporal attention mechanism to attend to most relevant frames of a video clip. Jang [10] utilized both appearance and motion features as video representations and applied spatial and temporal attention mechanism to attend to both relevant regions of a frame and frames of a video.…”

Section: Related Workmentioning

confidence: 99%

“…The questions are open-ended with pre-defined answer sets of size 1000. YouTube2Text-QA [30] collected three types of questions (what, who and other) from the YouTube2Text [7] video description corpus. The video source is also MSVD [4].…”

Section: Dataset Descriptionsmentioning

confidence: 99%

See 4 more Smart Citations

Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering

Cheng

Zhang

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

231

203

View full text Add to dashboard Cite

In this paper, we propose a novel end-to-end trainable Video Question Answering (VideoQA) framework with three major components: 1) a new heterogeneous memory which can effectively learn global context information from appearance and motion features; 2) a redesigned question memory which helps understand the complex semantics of question and highlights queried subjects; and 3) a new multimodal fusion layer which performs multi-step reasoning by attending to relevant visual and textual hints with selfupdated attention. Our VideoQA model firstly generates the global context-aware visual and textual features respectively by interacting current inputs with memory contents. After that, it makes the attentional fusion of the multimodal visual and textual representations to infer the correct answer. Multiple cycles of reasoning can be made to iteratively refine attention weights of the multimodal data and improve the final representation of the QA pair. Experimental results demonstrate our approach achieves state-of-theart performance on four VideoQA benchmark datasets.

show abstract

Section: Methods Question Typementioning

confidence: 99%

“…0.262). We also report the perclass accuracy to make direct comparison with [30], and our method is better than r-ANL in this evaluation method.…”

Section: Methods Question Typementioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Dataset Descriptionsmentioning

confidence: 99%

See 3 more Smart Citations

Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering

Cheng

Zhang

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

231

203

View full text Add to dashboard Cite

show abstract

Situational Emotions

Raikov

2024

SpringerBriefs in Applied Sciences and Technology

View full text Add to dashboard Cite

Automatic question-answer pairs generation and question similarity mechanism in question answering system

2021

View full text Add to dashboard Cite

With the swift growth of the information over the past few years, taking full benefit is increasingly essential. Question Answering System is one of the promising methods to access this much information. The Question Answering System lacks humans’ common sense and reasoning power and cannot identify unanswerable questions and irrelevant questions. These questions are answered by making unreliable and incorrect guesses. In this paper, we address this limitation by proposing a Question Similarity mechanism. Before a question is posed to a Question-Answering system, it is compared with possible generated questions of the given paragraph, and then a Question Similarity Score is generated. The Question Similarity mechanism effectively identifies the unanswerable and irrelevant questions. The proposed Question Similarity mechanism incorporates a human way of reasoning to identify unanswerable and irrelevant questions. This mechanism can avoid the unanswerable and irrelevant questions altogether from being posed to the Question Answering system. It helps the Question Answering Systems to focus only on the answerable questions to improve their performance. Along with this, we introduce an application of the Question Answering System that generates the question-answer pairs given a passage and is useful in several fields.

show abstract

Video question answering via grounded cross-attention network learning

Cited by 16 publications

References 10 publications

Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering

Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering

Situational Emotions

Automatic question-answer pairs generation and question similarity mechanism in question answering system

Contact Info

Product

Resources

About