Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence 2020
DOI: 10.24963/ijcai.2020/96
|View full text |Cite
|
Sign up to set email alerts
|

DAM: Deliberation, Abandon and Memory Networks for Generating Detailed and Non-repetitive Responses in Visual Dialogue

Abstract: Visual Dialogue task requires an agent to be engaged in a conversation with human about an image. The ability of generating detailed and non-repetitive responses is crucial for the agent to achieve human-like conversation. In this paper, we propose a novel generative decoding architecture to generate high-quality responses, which moves away from decoding the whole encoded semantics towards the design that advocates both transparency and flexibility. In this architecture, word generation is decomposed i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
8
1
1

Relationship

1
9

Authors

Journals

citations
Cited by 19 publications
(9 citation statements)
references
References 0 publications
0
9
0
Order By: Relevance
“…ReDAN [9] adopts multi-step reasoning and outperforms our model on some metrics. DMRM [4] and DAM [16] achieve higher performance by designing a more complex generative decoder. HACAN [46] introduces multihead attention and two-stage training, achieving comparable results with us.…”
Section: Overall Resultsmentioning
confidence: 99%
“…ReDAN [9] adopts multi-step reasoning and outperforms our model on some metrics. DMRM [4] and DAM [16] achieve higher performance by designing a more complex generative decoder. HACAN [46] introduces multihead attention and two-stage training, achieving comparable results with us.…”
Section: Overall Resultsmentioning
confidence: 99%
“…RvA (Niu et al, 2019), DVAN (Guo et al, 2019b) and DMRM (Chen et al, 2020a), DAM (Jiang et al, 2020c).…”
Section: Modelmentioning
confidence: 99%
“…The visual dialogue task was proposed by Das et al [6], and requires an agent to answer multi-round questions about a static image [7,18,20]. Previous work [12,19,21,24,41,57,61] focused on developing different attention mechanisms to model the interactions among image, question, and dialogue history [56].…”
Section: Visual Dialoguementioning
confidence: 99%