Proceedings of the 28th International Conference on Computational Linguistics 2020
DOI: 10.18653/v1/2020.coling-main.95
|View full text |Cite
|
Sign up to set email alerts
|

Imagining Grounded Conceptual Representations from Perceptual Information in Situated Guessing Games

Abstract: In visual guessing games, a Guesser has to identify a target object in a scene by asking questions to an Oracle. An effective strategy for the players is to learn conceptual representations of objects that are both discriminative and expressive enough to ask questions and guess correctly. However, as shown by Suglia et al. (2020), existing models fail to learn truly multi-modal representations, relying instead on gold category labels for objects in the scene both at training and inference time. This provides a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 8 publications
(11 citation statements)
references
References 28 publications
(47 reference statements)
0
10
0
Order By: Relevance
“…An accurate representation of state needs to be maintained as new information and observations accumulate [68]. Again, in recent deep learning approaches, no explicit state representation is developed, and the state information is encoded using sequences of prior turns in the interaction [10,62,65].…”
Section: State Trackingmentioning
confidence: 99%
See 1 more Smart Citation
“…An accurate representation of state needs to be maintained as new information and observations accumulate [68]. Again, in recent deep learning approaches, no explicit state representation is developed, and the state information is encoded using sequences of prior turns in the interaction [10,62,65].…”
Section: State Trackingmentioning
confidence: 99%
“…Another example is using context to re-rank possible system responses -either at the level of DM or NLG decision-making [56]. A particular recent focus is on the use and adaptation of large pre-trained vision-and-language models in interactive systems [63,65].…”
Section: Machine Learning Models Of Language Processingmentioning
confidence: 99%
“…By operating only in simulation, our model also misses the full range of experience that can ground language in the world [11], such as haptic feedback during object manipulation [78,79,68], and audio [16] and speech [31,41] features of the environment. Further, in ALFRED an agent never encounters novel object classes at inference time, which represent an additional challenge for successful task completion [72].…”
Section: Limitations and Impactmentioning
confidence: 99%
“…Humans highly rely on their prediction skills when interpreting a new input, integrating their perceptual signal with prior knowledge. We hope that more awareness of cognitive and neuroscience findings towards the combination of bottom‐up (perceptual) and top‐down (prior) knowledge will help shaping new multimodal models (Schüz & Zarrieß, 2020; Suglia et al., 2020; Testoni, Pezzelle et al., 2019).…”
Section: Open Challenges and Future Directionsmentioning
confidence: 99%