Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021
DOI: 10.18653/v1/2021.emnlp-main.736
|View full text |Cite
|
Sign up to set email alerts
|

Looking for Confirmations: An Effective and Human-Like Visual Dialogue Strategy

Abstract: Generating goal-oriented questions in Visual Dialogue tasks is a challenging and longstanding problem. State-Of-The-Art systems are shown to generate questions that, although grammatically correct, often lack an effective strategy and sound unnatural to humans. Inspired by the cognitive literature on information search and cross-situational word learning, we design Confirm-it, a model based on a beam search re-ranking algorithm that guides an effective goal-oriented strategy by asking questions that confirm th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 7 publications
(12 citation statements)
references
References 21 publications
0
5
0
Order By: Relevance
“…Existing interactive robots/agents using multimodal features have focused on question-answering from images [14], [15], request analysis [2], and conversations about images [16], [17], [18], [19]. Whether incorporating situation understanding results from multimodal cues significantly improves related tasks has been investigated to determine if we can clearly define things to be recognized for tasks.…”
Section: B Using Multimodal Cues For Action Decisionsmentioning
confidence: 99%
“…Existing interactive robots/agents using multimodal features have focused on question-answering from images [14], [15], request analysis [2], and conversations about images [16], [17], [18], [19]. Whether incorporating situation understanding results from multimodal cues significantly improves related tasks has been investigated to determine if we can clearly define things to be recognized for tasks.…”
Section: B Using Multimodal Cues For Action Decisionsmentioning
confidence: 99%
“…Simulating Dual-coding theory of human cognition to adaptively find query-related information from the image. Testoni et al [161] Asking questions to confirm the conjecture of models about the referent guided by human cognitive literature.…”
Section: Unique Training Schemesbased Vadmentioning
confidence: 99%
“…Motivated by Dual-coding theory [124] of human cognition, Dual Encoding Visual Dialogue (DualVD) model [65] adaptively finds query-related information from the image through intra-modal visual features and inter-modal visual-semantic knowledge semantics. Based on a beam search re-ranking algorithm, Testoni et al propose Confirm-it [161], which asks questions to confirm the conjecture of models about the referent with human cognitive literature on information search and cross-situational word learning. To explore the ability of AI dialogue agents to both ask questions and answer them as humans, researchers have made preliminary explorations.…”
Section: Unique Trainingmentioning
confidence: 99%
See 1 more Smart Citation
“…Starting from a probability distribution over all candidate tokens in the vocabulary, this technique samples the next token from the set of candidates defined as the top-p subset of the cumulative probability mass. Recently, Testoni and Bernardi (2021b) propose a beam-search re-ranking strategy to promote the generation of more effective questions throughout the dialogue. In this paper, we focus on the effect of different training sets using the same decoding strategy.…”
Section: Figurementioning
confidence: 99%