2020
DOI: 10.1007/978-3-030-58545-7_38
|View full text |Cite
|
Sign up to set email alerts
|

VisualEchoes: Spatial Image Representation Learning Through Echolocation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
41
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 50 publications
(43 citation statements)
references
References 78 publications
2
41
0
Order By: Relevance
“…Performance in spatial tasks can be greatly benefited from multimodality [5,57]. Auditory cues are extremely useful in spatial tasks in VR, and therefore have been widely explored: The effect of sound beacons in navigation performance when no visual cues are available has been explored [230], with some works proving that navigation when no visual information is available is possible using only auditory cues [62].…”
Section: Multimodality In Users' Performancementioning
confidence: 99%
See 1 more Smart Citation
“…Performance in spatial tasks can be greatly benefited from multimodality [5,57]. Auditory cues are extremely useful in spatial tasks in VR, and therefore have been widely explored: The effect of sound beacons in navigation performance when no visual cues are available has been explored [230], with some works proving that navigation when no visual information is available is possible using only auditory cues [62].…”
Section: Multimodality In Users' Performancementioning
confidence: 99%
“…Auditory cues are extremely useful in spatial tasks in VR, and therefore have been widely explored: The effect of sound beacons in navigation performance when no visual cues are available has been explored [230], with some works proving that navigation when no visual information is available is possible using only auditory cues [62]. Other works have exploited this, proposing a novel technique to visualize sounds, similar to how echolocation would work in animals, which improved the space perception in VR thanks to the integration of auditory and visual information [171], or combining the spatial information contained in echoes to benefit visual tasks requiring spatial reasoning [57]. Other senses have also been explored with the goal of enhancing spatial search tasks: Ammi and Katz [5] proposed a method coupling auditory and haptic information to enhance spatial reasoning, and thus improving performance in search tasks.…”
Section: Multimodality In Users' Performancementioning
confidence: 99%
“…Christensen et al [20] predict depth maps from real-world scenes using echo responses. Gao et al [21] learns visual representations by echolocation in a simulated environment [22]. In contrast, we learn through passive observation, rather than active sensing.…”
Section: Related Workmentioning
confidence: 99%
“…Ongoing work continues to explore audio-visual navigation models for embodied agents [8,9,14,21,33]. Other work predicts depth maps using spatial audio [11] and learns representations via interaction using echoes recorded in indoor 3D simulated environments [25]. In contrast to all of the above, we are interested in a different problem of generating accurate spatial binaural sound from videos.…”
Section: Introductionmentioning
confidence: 99%
“…Cross-modal learning is explored to understand the natural synchronisation between visuals and the audio [3,5,39]. Audio-visual data is leveraged for audio-visual speech recognition [12,28,59,62], audio-visual event localization [51,52,55], sound source localization [4,29,45,49,51,60], self-supervised representation learning [25,31,35,37,39], generating sounds from video [10,19,38,64], and audio-visual source separation for speech [1,2,13,16,18,37], music [20,22,56,60,61], and objects [22,24,53]. In contrast to all these methods, we perform a different task: to produce binaural two-channel audio from a monaural audio clip using a video's visual stream.…”
Section: Introductionmentioning
confidence: 99%