2021
DOI: 10.1007/978-3-030-69544-6_8
|View full text |Cite
|
Sign up to set email alerts
|

Do We Need Sound for Sound Source Localization?

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 39 publications
0
5
0
Order By: Relevance
“…Other approaches include those that determine the temporal alignment of videos and sounds [5], [14], [15], and hybrid approaches that combine both tasks [16]. In addition to simple single-domain applications such as sound/image classification [10], [11], [16] and action recognition [5], [15], these works demonstrate the benefits of learned features in complex cross-domain applications such as sound localization [3], [12]- [15], crossmodal retrieval [4], and sound separation [5]. However, the target of these prior works is limited to learning semantic cross-modal relationships.…”
Section: Related Work A: Self-supervised Audio-visual Learningmentioning
confidence: 99%
“…Other approaches include those that determine the temporal alignment of videos and sounds [5], [14], [15], and hybrid approaches that combine both tasks [16]. In addition to simple single-domain applications such as sound/image classification [10], [11], [16] and action recognition [5], [15], these works demonstrate the benefits of learned features in complex cross-domain applications such as sound localization [3], [12]- [15], crossmodal retrieval [4], and sound separation [5]. However, the target of these prior works is limited to learning semantic cross-modal relationships.…”
Section: Related Work A: Self-supervised Audio-visual Learningmentioning
confidence: 99%
“…There have been many sound source localization methods in the field of computer vision, e.g. mutual information and CCA [4], [5], CAM-based [6], [7], attention mechanism based [8], [9], [10], [11], those that utilize motion information [12], [13], [14]. Sound source localization and audio-visual sound source separation are closely related because it is necessary to identify the position of the sound source in an image in order to perform audiovisual sound source separation.…”
Section: Related Workmentioning
confidence: 99%
“…However, this work still requires extra scene prior due to the lack of one-to-one annotations. Oya et al [38] proposed a step-wise training strategy that first gets potential sounding objects based on visual information and then identifies the proposal based on audio information. Nevertheless, the experimental scenario in the work is relatively simple (two objects, one of which makes a sound).…”
Section: Sounding Object Localization In Visual Scenesmentioning
confidence: 99%