Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing 2022
DOI: 10.18653/v1/2022.emnlp-main.546
|View full text |Cite
|
Sign up to set email alerts
|

Communication breakdown: On the low mutual intelligibility between human and neural captioning

Abstract: We compare the 0-shot performance of a neural caption-based image retriever when given as input either human-produced captions or captions generated by a neural captioner. We conduct this comparison on the recently introduced IMAGECODE data-set (Krojer et al., 2022), which contains hard distractors nearly identical to the images to be retrieved. We find that the neural retriever has much higher performance when fed neural rather than human captions, despite the fact that the former, unlike the latter, were gen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 16 publications
0
1
0
Order By: Relevance
“…Human-centered EmCom research is developed in game settings and thus employs reinforcement learning, whereas IC predominantly uses supervised learning. While Hum-EmCom explicitly models the interaction among multiple agents with a shared goal, IC is focused on architectures capable of mimicking the human ability to use language in a visual context, which may not align with human understanding [168]. As both fields aim to refine artificial languages to better resemble human-like ones, they should be regarded as complementary components of the broader challenge of achieving this goal.…”
Section: B Human-centered Emcommentioning
confidence: 99%
“…Human-centered EmCom research is developed in game settings and thus employs reinforcement learning, whereas IC predominantly uses supervised learning. While Hum-EmCom explicitly models the interaction among multiple agents with a shared goal, IC is focused on architectures capable of mimicking the human ability to use language in a visual context, which may not align with human understanding [168]. As both fields aim to refine artificial languages to better resemble human-like ones, they should be regarded as complementary components of the broader challenge of achieving this goal.…”
Section: B Human-centered Emcommentioning
confidence: 99%