Proceedings of the 29th ACM International Conference on Multimedia 2021
DOI: 10.1145/3474085.3475483
|View full text |Cite
|
Sign up to set email alerts
|

Cross-modal Joint Prediction and Alignment for Composed Query Image Retrieval

Abstract: In this paper, we focus on the composed query image retrieval task, namely retrieving the target images that are similar to a composed query, in which a modification text is combined with a query image to describe a user's accurate search intention. Previous methods usually focus on learning the joint image-text representations, but rarely consider the intrinsic relationship among the query image, the target image and the modification text. To address this problem, we propose a new cross-modal joint prediction… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(8 citation statements)
references
References 37 publications
0
8
0
Order By: Relevance
“…Subsequently, more and more works focus on this CTI-IR task. The previous works [4,5,12,15,18,19,34,36,39,41] can be categorized into two types. The first type of works [5,15,19,34,36] mainly focus on the multi-modal fusion between image and text queries.…”
Section: Related Work 21 Image Retrievalmentioning
confidence: 99%
See 3 more Smart Citations
“…Subsequently, more and more works focus on this CTI-IR task. The previous works [4,5,12,15,18,19,34,36,39,41] can be categorized into two types. The first type of works [5,15,19,34,36] mainly focus on the multi-modal fusion between image and text queries.…”
Section: Related Work 21 Image Retrievalmentioning
confidence: 99%
“…Previous approaches [5,12,15,18,19,34,36,39] for this task can be categorized into two types. The first type of works [5,15,19,34,36] mainly focus on designing complex components for the multi-modal fusion between text and image queries.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…A compositor plays a fundamental role to integrate the textual information with the imagery modality. TGR compositors have been proposed based on various techniques, such as gating mechanism [49], hierarchical attention [7,23,12,20], graph neural network [54,44], joint learning [6,27,44,52,55], ensemble learning [50], style-content modification [29,5] and vision & language pre-training [32].…”
Section: Related Workmentioning
confidence: 99%