2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023
DOI: 10.1109/wacv56688.2023.00221
|View full text |Cite
|
Sign up to set email alerts
|

Content-Based Music-Image Retrieval Using Self- and Cross-Modal Feature Embedding Memory

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 46 publications
0
0
0
Order By: Relevance
“…The multi-modal retrieval task has been studied using various modalities such as image-text retrieval Zhang et al, 2020;Cheng et al, 2022;Luo et al, 2022;Xuan and Chen, 2023), video-text Gorti et al, 2022;, audio-image (Xu, 2020;Nakatsuka et al, 2023), video-audio (Surís et al, 2018;Gu et al, 2023; and audio-text (Kim et al, 2022;Xin et al, 2023). Particularly, CLIP4CLIP (Luo et al, 2022), which performs well in the videotext retrieval task by calculating the similarities between the features of each modality obtained from the encoder, and X-CLIP expands CLIP4CLIP and proposes a multi-grained regulation function to improve performance.…”
Section: Related Workmentioning
confidence: 99%
“…The multi-modal retrieval task has been studied using various modalities such as image-text retrieval Zhang et al, 2020;Cheng et al, 2022;Luo et al, 2022;Xuan and Chen, 2023), video-text Gorti et al, 2022;, audio-image (Xu, 2020;Nakatsuka et al, 2023), video-audio (Surís et al, 2018;Gu et al, 2023; and audio-text (Kim et al, 2022;Xin et al, 2023). Particularly, CLIP4CLIP (Luo et al, 2022), which performs well in the videotext retrieval task by calculating the similarities between the features of each modality obtained from the encoder, and X-CLIP expands CLIP4CLIP and proposes a multi-grained regulation function to improve performance.…”
Section: Related Workmentioning
confidence: 99%