Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning

Shi, Xiangxi; Xu, Yuebin; Gu, Jiuxiang; Joty, Shafiq; Cai, Jianfei

doi:10.1007/978-3-030-58568-6_34

Cited by 35 publications

(29 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To validate the generalization ability of the proposed method, we conduct the experiments on a recent published Spot-the-Diff dataset, where the image pairs are mostly well aligned and their is no viewpoint change. We compare with eight SOTA methods and most of them cannot consider handling viewpoint changes: DDLA (Jhamtani and Berg-Kirkpatrick, 2018), DDUA (Park et al, 2019), SDCM (Oluwasanmi et al, 2019a), FCC (Oluwasanmi et al, 2019b), static rel-att / dyanmic rel-att (Tan et al, 2019), and M-VAM / M-VAM+RAF (Shi et al, 2020).…”

Section: Results On Spot-the-diffmentioning

confidence: 99%

“…However, it is built upon an ideal situation by assuming there are no distractors (illumination/viewpoint change) between a pair of images. To make this task more close to our dynamic world, Park et al and Shi et al (Park et al, 2019;Shi et al, 2020) both aimed to address change captioning in the existence of distractors. On one hand, Park et al directly concatenated the coarse feature difference with the image pair to operate spatial attention to localize the change.…”

Section: Related Workmentioning

confidence: 99%

“…When observing the performance of two kinds of combinations (SR-DRL and SRDRL+AVS), both of them improve the baseline in all metrics, which indicates the robustness of our overall model is strong. (Shi et al, 2020), in four dimensions: 1) the total performance of scene change and none-scene change; 2) only scene change; 3) only none-scene change; 4) specific type of scene change. The comparison results are shown in Table 3, Table 4, and Table 5, respectively.…”

Section: Ablation Studiesmentioning

confidence: 99%

“…Compared to semantic changes, both illumination and viewpoint changes are irrelevant distractors, so realistic change captioning requires a model: 1) distinguishing semantic changes (e.g., an object has moved) from distractors (e.g., a viewpoint change) and 2) conveying the detected change in a logically and grammatically accurate sentence. To this end, recent works (Park et al, 2019;Shi et al, 2020) focused on addressing change captioning in the presence of distractors.…”

Section: <Before> <After> <Change Caption>mentioning

confidence: 99%

See 3 more Smart Citations

Semantic Relation-aware Difference Representation Learning for Change Captioning

Tu¹,

Yao²,

Li³

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

Change captioning is to describe the difference in a pair of images with a natural language sentence. In this task, the distractors, such as the illumination or viewpoint change, bring the huge challenges about learning the difference representation. In this paper, we propose a semantic relation-aware difference representation learning network to explicitly learn the difference representation in the existence of distractors. Specifically, we introduce a selfsemantic relation embedding block to explore the underlying changed objects and design a cross-semantic relation measuring block to localize the real change and learn the discriminative difference representation. Besides, relying on the POS of words, we devise an attentionbased visual switch to dynamically use visual information for caption generation. Extensive experiments show that our method achieves the state-of-the-art performances on CLEVR-Change and Spot-the-Diff datasets 1 .

show abstract

Section: Results On Spot-the-diffmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Ablation Studiesmentioning

confidence: 99%

Section: <Before> <After> <Change Caption>mentioning

confidence: 99%

See 2 more Smart Citations

Semantic Relation-aware Difference Representation Learning for Change Captioning

Tu¹,

Yao²,

Li³

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

show abstract

“…Hence, feature shift between two unaligned images will adversely affect the learning of difference representation. To make this task more practical, recent works (Park et al, 2019;Shi et al, 2020) proposed to address change captioning in the presence of viewpoint changes. Despite the progress, there are some limitations for the above state-of-the-art methods when modeling the difference representation.…”

Section: Introductionmentioning

confidence: 99%

R$^3$Net:Relation-embedded Representation Reconstruction Network for Change Captioning

Tu¹,

Li²,

Yan³

et al. 2021

Preprint

View full text Add to dashboard Cite

Change captioning is to use a natural language sentence to describe the fine-grained disagreement between two similar images. Viewpoint change is the most typical distractor in this task, because it changes the scale and location of the objects and overwhelms the representation of real change. In this paper, we propose a Relation-embedded Representation Reconstruction Network (R 3 Net) to explicitly distinguish the real change from the large amount of clutter and irrelevant changes. Specifically, a relation-embedded module is first devised to explore potential changed objects in the large amount of clutter. Then, based on the semantic similarities of corresponding locations in the two images, a representation reconstruction module (RRM) is designed to learn the reconstruction representation and further model the difference representation. Besides, we introduce a syntactic skeleton predictor (SSP) to enhance the semantic interaction between change localization and caption generation. Extensive experiments show that the proposed method achieves the state-of-the-art results on two public datasets 1 .

show abstract

Distractors-Immune Representation Learning with Cross-Modal Contrastive Regularization for Change Captioning

Tu,

Li,

et al. 2024

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning

Cited by 35 publications

References 22 publications

Semantic Relation-aware Difference Representation Learning for Change Captioning

Semantic Relation-aware Difference Representation Learning for Change Captioning

R$^3$Net:Relation-embedded Representation Reconstruction Network for Change Captioning

Distractors-Immune Representation Learning with Cross-Modal Contrastive Regularization for Change Captioning

Contact Info

Product

Resources

About