2019
DOI: 10.48550/arxiv.1902.09368
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Dual Attention Networks for Visual Reference Resolution in Visual Dialog

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 12 publications
0
10
0
Order By: Relevance
“…Compared to results of the Visual Dialog challenge 2019, our models also show strong results. Although ReDAN+ (Gan et al 2019) and MReaL-BDAI show higher NDCG scores, our consensus dropout fusion model shows more balanced results over metrics while still having a competitive NDCG score compared to DAN (Kang, Lim, and Zhang 2019), with rank 3 based on NDCG metric and high balance rank based on metric average. 4…”
Section: Final Visual Dialog Test Resultsmentioning
confidence: 78%
“…Compared to results of the Visual Dialog challenge 2019, our models also show strong results. Although ReDAN+ (Gan et al 2019) and MReaL-BDAI show higher NDCG scores, our consensus dropout fusion model shows more balanced results over metrics while still having a competitive NDCG score compared to DAN (Kang, Lim, and Zhang 2019), with rank 3 based on NDCG metric and high balance rank based on metric average. 4…”
Section: Final Visual Dialog Test Resultsmentioning
confidence: 78%
“…Multimodal models have proven their ability of modeling interactions between different modalities and better undertanding the semantics behind textual utterances [13,19,32,51,56,58,64,65,78], and pretraining on additional data gives further performance boosts for a variety of established visionand-language tasks, such as visual question answering [3,46,52], visual commonsense reasoning [37,43] and text-to-image generation [49,59,60]. However, these works focus on the QA style visual dialog, rather than the conversation style with which we are more concerned.…”
Section: Jointly Modeling Visual and Textual Informationmentioning
confidence: 99%
“…Visual Dialog Generation Most of existing works apply attention mechanisms to model the interplay between text and visual contexts (Lu et al, 2017;Kottur et al, 2018;Jiang and Bansal, 2019;Yang et al, 2019;Guo et al, 2019;Niu et al, 2019;Kang et al, 2019;Park et al, 2020;Jiang et al, 2020b). Other techniques like reinforcement learning (Das et al, 2017b;Wu et al, 2018), variational auto-encoders (Massiceti et al, 2018) and graph networks Jiang et al, 2020a) have also been employed to the visual dialog task.…”
Section: Dialog Generationmentioning
confidence: 99%