2019
DOI: 10.1109/tmm.2019.2896494
|View full text |Cite
|
Sign up to set email alerts
|

COCO-CN for Cross-Lingual Image Tagging, Captioning, and Retrieval

Abstract: This paper contributes to cross-lingual image annotation and retrieval in terms of data and baseline methods. We propose COCO-CN, a novel dataset enriching MS-COCO with manually written Chinese sentences and tags. For more effective annotation acquisition, we develop a recommendationassisted collective annotation system, automatically providing an annotator with several tags and sentences deemed to be relevant with respect to the pictorial content. Having 20,342 images annotated with 27,218 Chinese sentences a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
72
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 97 publications
(74 citation statements)
references
References 40 publications
(83 reference statements)
0
72
0
Order By: Relevance
“…Pappas et al [42] propose multilingual visual concept clustering to study the commonalities and differences among different languages. Meanwhile, multilingual image captioning is introduced to describe the content of an image with multiple languages [32,57,33]. But none of them study the interaction between videos and multilingual knowledge.…”
Section: Related Workmentioning
confidence: 99%
“…Pappas et al [42] propose multilingual visual concept clustering to study the commonalities and differences among different languages. Meanwhile, multilingual image captioning is introduced to describe the content of an image with multiple languages [32,57,33]. But none of them study the interaction between videos and multilingual knowledge.…”
Section: Related Workmentioning
confidence: 99%
“…Yoshikawa et al [27] further enlarged the collection of Japanese captions for MS COCO and released the STAIR Captions dataset. There are also some extensions for Chinese, such as [28], [29]. Li et al [28] presented comparison of Chinese caption datasets constructed by crowdsourcing and machine translation.…”
Section: B Cross-lingual Vision and Languagementioning
confidence: 99%
“…Li et al [28] presented comparison of Chinese caption datasets constructed by crowdsourcing and machine translation. Li et al [29] added Chinese captions and tags for MS COCO. For video captions, Chen and Dolan [30] collected short video clips and captions in many different languages.…”
Section: B Cross-lingual Vision and Languagementioning
confidence: 99%
“…Besides, generating image caption is mostly in English, as most of the available datasets are in this language [31,34]. Only few studies have been conducted on cross-lingual image captioning [17,[38][39]. In this paper, the model is designed to perform cross-lingual image caption.…”
Section: Related Workmentioning
confidence: 99%