2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2022
DOI: 10.1109/wacv51458.2022.00221
|View full text |Cite
|
Sign up to set email alerts
|

Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross Modal Attention

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(2 citation statements)
references
References 24 publications
0
2
0
Order By: Relevance
“…Researchers have used these to solve the problem in other domains, such as image, audio, and video. Various transformers are proposed to handle a set of modalities such as video with text, image with text, and image with depth [52]. These are famous as Multimodal transformers [53], [54].…”
Section: Previous Workmentioning
confidence: 99%
“…Researchers have used these to solve the problem in other domains, such as image, audio, and video. Various transformers are proposed to handle a set of modalities such as video with text, image with text, and image with depth [52]. These are famous as Multimodal transformers [53], [54].…”
Section: Previous Workmentioning
confidence: 99%
“…It would be nice if the two approaches can be connected together. This paper showed a method for generating binaural audio with a deep neural network instead of HRTFs [11]. They also included a system to extract positional information from images, and with the model, they were also able to estimate a depth map to aid the generation of the final binaural signal, which is something we have not implemented in our system.…”
Section: Related Workmentioning
confidence: 99%