2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2022
DOI: 10.1109/wacv51458.2022.00067
|View full text |Cite
|
Sign up to set email alerts
|

SAC: Semantic Attention Composition for Text-Conditioned Image Retrieval

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(3 citation statements)
references
References 32 publications
0
3
0
Order By: Relevance
“…To evaluate our model, we chose three real-world datasets: Fashion200K (Han et al 2017), Shoes (Guo et al 2018), and FashionIQ (Wu et al 2021). We compare our DWC with many SOTA MMIR methods, such as TIRG (Vo et al 2019), JAMMAL (Zhang et al 2020), LBF (Hosseinzadeh and Wang 2020), JVSM (Chen and Bazzani 2020), SynthTripletGAN (Tautkute and Trzcinski 2021), VAL (Chen, Gong, and Bazzani 2020), DCNet (Kim et al 2021), JPM (Yang et al 2021b), DATIR (Gu et al 2021), ComposeAE (Anwaar, Labintcev, and Kleinsteuber 2021), CoSMo (Lee, Kim, and Han 2021), CLVC-Net (Wen et al 2021), ARTEMIS (Delmas et al 2022), SAC (Jandial et al 2022), GA (Huang et al 2022), CIRPLANT (Liu et al 2021), Combiner w/ CLIP (Baldrati et al 2022b), and Fash-ionVLP (Goenka et al 2022), where the methods in italic are based on VLP models.…”
Section: Methodsmentioning
confidence: 99%
“…To evaluate our model, we chose three real-world datasets: Fashion200K (Han et al 2017), Shoes (Guo et al 2018), and FashionIQ (Wu et al 2021). We compare our DWC with many SOTA MMIR methods, such as TIRG (Vo et al 2019), JAMMAL (Zhang et al 2020), LBF (Hosseinzadeh and Wang 2020), JVSM (Chen and Bazzani 2020), SynthTripletGAN (Tautkute and Trzcinski 2021), VAL (Chen, Gong, and Bazzani 2020), DCNet (Kim et al 2021), JPM (Yang et al 2021b), DATIR (Gu et al 2021), ComposeAE (Anwaar, Labintcev, and Kleinsteuber 2021), CoSMo (Lee, Kim, and Han 2021), CLVC-Net (Wen et al 2021), ARTEMIS (Delmas et al 2022), SAC (Jandial et al 2022), GA (Huang et al 2022), CIRPLANT (Liu et al 2021), Combiner w/ CLIP (Baldrati et al 2022b), and Fash-ionVLP (Goenka et al 2022), where the methods in italic are based on VLP models.…”
Section: Methodsmentioning
confidence: 99%
“…Generally, there are two families of works on image retrieval with text feedback based on whether using the pre-trained model. The first line of works mainly studies how to properly combine the features of the two modalities [1,3,13,41]. Content-Style Modulation (CosMo) [18] proposes a new image-based compositor containing two independent modulators.…”
Section: Composed Image Retrieval With Text Feedbackmentioning
confidence: 99%
“…Multi-label lazy learning approach was given based on its k nearest neighbors, and maximum a posteriori (MAP) principle was utilized to determine its category. Jandial et al (2022) gave a novel semantic attention composition framework for text-conditioned image retrieval including semantic feature attention and semantic feature modification. However, this method can only retrieve 3D model by tags, and cannot retrieve 3D model based on its content.…”
Section: Introductionmentioning
confidence: 99%