2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.00337
|View full text |Cite
|
Sign up to set email alerts
|

Sound-Guided Semantic Image Manipulation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 22 publications
(17 citation statements)
references
References 26 publications
0
6
0
Order By: Relevance
“…Huang, Patrick, et al, 2021), sketch-based image retrieval (Jing et al, 2022), code search (D. Guo, Lu, Duan, et al, 2022), visual question answering (Z. , event detection (Elhoseiny et al, 2016;S. Wu et al, 2014), visual grounding (Tziafas & Kasaei, 2021), natural language grounding (Sinha et al, 2019), semantic image manipulation (S. H. Lee et al, 2022), medical image segmentation (Bian et al, 2022), video object segmentation (Zhao et al, 2021), sign language recognition (Madapana, 2020), tactile object recognition (H. Liu et al, 2018), and driver behavior recognition (Reiß et al, 2020). More cases can refer to this survey (Cao et al, 2020).…”
Section: Benchmark Datasetsmentioning
confidence: 99%
See 3 more Smart Citations
“…Huang, Patrick, et al, 2021), sketch-based image retrieval (Jing et al, 2022), code search (D. Guo, Lu, Duan, et al, 2022), visual question answering (Z. , event detection (Elhoseiny et al, 2016;S. Wu et al, 2014), visual grounding (Tziafas & Kasaei, 2021), natural language grounding (Sinha et al, 2019), semantic image manipulation (S. H. Lee et al, 2022), medical image segmentation (Bian et al, 2022), video object segmentation (Zhao et al, 2021), sign language recognition (Madapana, 2020), tactile object recognition (H. Liu et al, 2018), and driver behavior recognition (Reiß et al, 2020). More cases can refer to this survey (Cao et al, 2020).…”
Section: Benchmark Datasetsmentioning
confidence: 99%
“…Instead of an introduction that concentrates on the applications themselves, several datasets available in various scenarios are offered to readers as guidelines, such as cross‐modal classification and retrieval (Geigle et al, 2022; Mercea et al, 2022; Parida et al, 2020; Shvetsova et al, 2022; Wray et al, 2019), cross‐lingual retrieval (P.‐Y. Huang, Patrick, et al, 2021), sketch‐based image retrieval (Jing et al, 2022), code search (D. Guo, Lu, Duan, et al, 2022), visual question answering (Z. Chen, Chen, et al, 2021), event detection (Elhoseiny et al, 2016; S. Wu et al, 2014), visual grounding (Tziafas & Kasaei, 2021), natural language grounding (Sinha et al, 2019), semantic image manipulation (S. H. Lee et al, 2022), medical image segmentation (Bian et al, 2022), video object segmentation (Zhao et al, 2021), sign language recognition (Madapana, 2020), tactile object recognition (H. Liu et al, 2018), and driver behavior recognition (Reiß et al, 2020). More cases can refer to this survey (Cao et al, 2020).…”
Section: Model Evaluation Metrics and Datasets For Mzslmentioning
confidence: 99%
See 2 more Smart Citations
“…Recently, conditional information in other modalities, such text [198]- [202] and speech [203]- [205], has attracted increasing research attention due to the development of pre-trained large-scale frameworks (e.g., CLIP [206]) and availability of related datasets (CelebA-Dialog [207]). Moreover, novel modalities of supervision signal, such as biometrics (e.g., brain responses recorded via electroencephalography [208]) and sound [209], have also been utilized to learn feature representations for semantic editing.…”
Section: Challenges and Future Directionsmentioning
confidence: 99%