2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2021
DOI: 10.1109/cvprw53098.2021.00448
|View full text |Cite
|
Sign up to set email alerts
|

Leveraging Style and Content features for Text Conditioned Image Retrieval

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 7 publications
0
6
0
Order By: Relevance
“…Anwaar et al ( 2021) use an autoencoder-based model to map the reference and the target images into the same complex space and learn the text modifier representation as a transformation in this space. Lee et al (2021) and Chawla et al (2021) both propose to disentangle the multi-modal information into content and style. resort to image's descriptive texts as side information to train a joint visual-semantic space, training a TIRG model on top.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Anwaar et al ( 2021) use an autoencoder-based model to map the reference and the target images into the same complex space and learn the text modifier representation as a transformation in this space. Lee et al (2021) and Chawla et al (2021) both propose to disentangle the multi-modal information into content and style. resort to image's descriptive texts as side information to train a joint visual-semantic space, training a TIRG model on top.…”
Section: Related Workmentioning
confidence: 99%
“…In contrast to most methods described above, ARTEMIS does not compose modalities into a joint global feature for the query (Vo et al, 2019;Lee et al, 2021), does not compute costly cross-attention involving the target image (Hosseinzadeh & Wang, 2020;Chawla et al, 2021) and does not extract multi-level visual representations . Instead it leverages the textual modifier in simple attention mechanisms to weight the dimensions of the visual representation, emphasizing the characteristics on which the matching should focus.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…A compositor plays a fundamental role to integrate the textual information with the imagery modality. TGR compositors have been proposed based on various techniques, such as gating mechanism [49], hierarchical attention [7,23,12,20], graph neural network [54,44], joint learning [6,27,44,52,55], ensemble learning [50], style-content modification [29,5] and vision & language pre-training [32].…”
Section: Related Workmentioning
confidence: 99%
“…The next issue is the application of the image search algorithm. Image Search is a fundamental task playing a significant role in the success of a wide variety of frameworks and applications [ 8 ]. An important method to compare semantic similarity between text and images is CLIP Contrastive Language-Image Pre-Training).…”
Section: Introductionmentioning
confidence: 99%