2020
DOI: 10.48550/arxiv.2007.03310
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

DAM: Deliberation, Abandon and Memory Networks for Generating Detailed and Non-repetitive Responses in Visual Dialogue

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 9 publications
0
2
0
Order By: Relevance
“…VisDial Similar to the previous work (Kang et al 2023), we compare the performance of our method with 10 baselines: 1) Attention-based models: CoAtt (Wu et al 2018), HCIAE (Lu et al 2017), Primary (Guo, Xu, and Tao 2019), ReDAN (Gan et al 2019), DMRM (Chen et al 2020a), DAM (Jiang et al 2020b) 2) Graph-based models: KBGN (Jiang et al 2020a), LTMI (Nguyen, Suganuma, and Okatani 2020), LTMI-GoG (Chen et al 2021) 3) Semi-supervised learning model: GST (Kang et al 2023).…”
Section: Baselinesmentioning
confidence: 99%
“…VisDial Similar to the previous work (Kang et al 2023), we compare the performance of our method with 10 baselines: 1) Attention-based models: CoAtt (Wu et al 2018), HCIAE (Lu et al 2017), Primary (Guo, Xu, and Tao 2019), ReDAN (Gan et al 2019), DMRM (Chen et al 2020a), DAM (Jiang et al 2020b) 2) Graph-based models: KBGN (Jiang et al 2020a), LTMI (Nguyen, Suganuma, and Okatani 2020), LTMI-GoG (Chen et al 2021) 3) Semi-supervised learning model: GST (Kang et al 2023).…”
Section: Baselinesmentioning
confidence: 99%
“…Then attention-based models (Lu et al, 2017;Wu et al, 2018;Kottur et al, 2018) are proposed to dynamically attend to spatial image features in order to find related visual content. Furthermore, models based on object-level image features Gan et al, 2019;Chen et al, 2020a;Jiang et al, 2020a;Nguyen et al, 2020;Jiang et al, 2020b) are proposed to effectively leverage the visual content for multimodal co-reference. However, as implicit exploration of multimodal co-reference, these methods implicitly attend to spatial or object-level image features, which is trained with the whole model and is inevitably distracted by unnecessary visual content.…”
Section: Introductionmentioning
confidence: 99%