“…To evaluate our model, we chose three real-world datasets: Fashion200K (Han et al 2017), Shoes (Guo et al 2018), and FashionIQ (Wu et al 2021). We compare our DWC with many SOTA MMIR methods, such as TIRG (Vo et al 2019), JAMMAL (Zhang et al 2020), LBF (Hosseinzadeh and Wang 2020), JVSM (Chen and Bazzani 2020), SynthTripletGAN (Tautkute and Trzcinski 2021), VAL (Chen, Gong, and Bazzani 2020), DCNet (Kim et al 2021), JPM (Yang et al 2021b), DATIR (Gu et al 2021), ComposeAE (Anwaar, Labintcev, and Kleinsteuber 2021), CoSMo (Lee, Kim, and Han 2021), CLVC-Net (Wen et al 2021), ARTEMIS (Delmas et al 2022), SAC (Jandial et al 2022), GA (Huang et al 2022), CIRPLANT (Liu et al 2021), Combiner w/ CLIP (Baldrati et al 2022b), and Fash-ionVLP (Goenka et al 2022), where the methods in italic are based on VLP models.…”