Findings of the Association for Computational Linguistics: EMNLP 2021 2021
DOI: 10.18653/v1/2021.findings-emnlp.31
|View full text |Cite
|
Sign up to set email alerts
|

Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer

Abstract: Visual dialog is a task of answering a sequence of questions grounded in an image using the previous dialog history as context. In this paper, we study how to address two fundamental challenges for this task: (1) reasoning over underlying semantic structures among dialog rounds and (2) identifying several appropriate answers to the given question. To address these challenges, we propose a Sparse Graph Learning (SGL) method to formulate visual dialog as a graph structure learning task. SGL infers inherently spa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 33 publications
(44 reference statements)
0
5
0
Order By: Relevance
“…Recent studies finetune the VisDial models on the densely annotated labels 3 in the validation dataset and evaluate the models on the test dataset. By following the method applied in SGL+KT [20], we also train the student model on the dense labels. In Table 8, the models like VisDial-BERT and VD-BERT show huge improvements on NDCG and counter-effect on other metrics when utilizing the dense labels.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Recent studies finetune the VisDial models on the densely annotated labels 3 in the validation dataset and evaluate the models on the test dataset. By following the method applied in SGL+KT [20], we also train the student model on the dense labels. In Table 8, the models like VisDial-BERT and VD-BERT show huge improvements on NDCG and counter-effect on other metrics when utilizing the dense labels.…”
Section: Discussionmentioning
confidence: 99%
“…Prior work has developed a variety attention mechanisms [5][6][7][8][9][10][11][12][13]18] considering the interactions among the image, dialog history, and question. Some studies [14,20] have attempted to discover the semantic structures of the dialog in the context of graph neural networks [64] using the soft attention mechanisms [65]. From the learning algorithm perspective, all of them have relied on supervised learning on VisDial data.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Then the high-level representation of cross-modal information is used to generate visually and contextually coherent responses. By leveraging a new structural loss function, Sparse Graph Learning (SGL) [69] learns the semantic relationships among dialog rounds and predicts sparse structures of the visually-grounded dialog. To identify multiple possible answers, the Knowledge Transfer (KT) method is also proposed to extract the soft scores of each candidate answer.…”
Section: Graph-based Semantic Relationmentioning
confidence: 99%
“…In the generatrive setting, we consider recent generative baselines, including KBGN [17], DAM [19], LTMI [28], MITVG [9], GOG-Multi-Gen [7], and LTMI-LG [8]. In the discriminative setting, we consider state-of-the-art discriminative baselines, including KBGN [17], Modality-Balanced [21], FGA [35], Du-alVD [18], MCA [1], P1+P2 [32], SGL [20], CARE [25], LTMI-GOG-Multi [7], VisDial-BERT [27], VD-BERT [37], and VD-PCR [42].…”
Section: Baselinesmentioning
confidence: 99%