2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.00198
|View full text |Cite
|
Sign up to set email alerts
|

Describing and Localizing Multiple Changes with Transformers

Abstract: Change captioning tasks aim to detect changes in image pairs observed before and after a scene change and generate a natural language description of the changes. Existing change captioning studies have mainly focused on scenes with a single change. However, detecting and describing multiple changed parts in image pairs is essential for enhancing adaptability to complex scenarios. We solve the above issues from three aspects: (i) We propose a CG-based multi-change captioning dataset; (ii) We benchmark existing … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 39 publications
(15 citation statements)
references
References 34 publications
0
8
0
Order By: Relevance
“…To date, no prior research has specifically addressed the "image difference question answering" problem. Only a few studies have focused on the general image difference caption task, such as MMCFormers [25] and IDCPCL [31]. Therefore, our work serves as the first step in this new direction and provides a valuable contribution to the research community.…”
Section: Baselinesmentioning
confidence: 97%
See 2 more Smart Citations
“…To date, no prior research has specifically addressed the "image difference question answering" problem. Only a few studies have focused on the general image difference caption task, such as MMCFormers [25] and IDCPCL [31]. Therefore, our work serves as the first step in this new direction and provides a valuable contribution to the research community.…”
Section: Baselinesmentioning
confidence: 97%
“…Within the language generation and vision research domain, the most related works to the medical image difference VQA task is image difference captioning [20,25,31], which is designed to identify object movements and changes within a spatial context such as a static or complex background. As shown in the left Fig.…”
Section: Anatomical Structure-aware Graph Construction and Feature Le...mentioning
confidence: 99%
See 1 more Smart Citation
“…Most of the existing works in the change detection literature belong to this category. [18,17,21,12] tackle the change captioning problem where the goal is to describe the changes in an image pair in natural language. These methods mainly evaluate their approach on the STD [10] (images from fixed video surveillance camera), or CLEVR-based change datasets [18,21,12] (synthetic images of 3D objects of primitive shapes).…”
Section: Related Workmentioning
confidence: 99%
“…According to the number of input images, they can be divided into two categories: Two-image based and Groupbased captioning. Two-image based captioning tends to describe the common [36] or different [28,30,37,49] parts between the two images. Thus, the two images in their settings always have strong correlations.…”
Section: Related Workmentioning
confidence: 99%