2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00904
|View full text |Cite
|
Sign up to set email alerts
|

nocaps: novel object captioning at scale

Abstract: Image captioning models have achieved impressive results on datasets containing limited visual concepts and large amounts of paired image-caption training data.

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
165
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 233 publications
(198 citation statements)
references
References 40 publications
0
165
0
Order By: Relevance
“…Furthermore, when evaluating out-of-domain images or images with unseen concepts, it has been shown that the generated captions are often of poor quality (Mao et al, 2015;Vinyals et al, 2017). Attempts have been made to address the latter issue by leveraging unpaired text data or pre-trained language models (Hendricks et al, 2016;Agrawal et al, 2018).…”
Section: Introductionmentioning
confidence: 99%
“…Furthermore, when evaluating out-of-domain images or images with unseen concepts, it has been shown that the generated captions are often of poor quality (Mao et al, 2015;Vinyals et al, 2017). Attempts have been made to address the latter issue by leveraging unpaired text data or pre-trained language models (Hendricks et al, 2016;Agrawal et al, 2018).…”
Section: Introductionmentioning
confidence: 99%
“…Multilingual Visual Understanding. Numerous tasks have been proposed to combine vision and language to enhance the understanding of either or both, such as video/image captioning [18,60,2], visual question answering (VQA) [4], and natural language moment retrieval [25], etc. Multilingual studies are rarely explored in the vision and language domain.…”
Section: Related Workmentioning
confidence: 99%
“…Many of the popular captioning datasets in the AI community were created using the same basic crowdsourcing task design. This task design, first developed in 2013 [49,106], remains the standard approach [7,28]. One concern about crowdsourced datasets built using this standard task design is that captions for the same image generated by different people can vary considerably [53,98].…”
Section: The Critical Foundation Of Image Captioning Algorithms: Largmentioning
confidence: 99%