ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414750
|View full text |Cite
|
Sign up to set email alerts
|

Implicit HRTF Modeling Using Temporal Convolutional Networks

Abstract: Estimation of accurate head-related transfer functions (HRTFs) is crucial to achieve realistic binaural acoustic experiences. HRTFs depend on source/listener locations and are therefore expensive and cumbersome to measure; traditional approaches require listenerdependent measurements of HRTFs at thousands of distinct spatial directions in an anechoic chamber. In this work, we present a datadriven approach to learn HRTFs implicitly with a neural network that achieves state of the art results compared to traditi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 15 publications
(6 citation statements)
references
References 23 publications
(22 reference statements)
0
6
0
Order By: Relevance
“…The hyper-convolution layer is a linear module of which the weights and biases are generated from given conditions. In the work by [25,19], the hyper-convolution was used for conditioning the temporal convolutional network that operates in the time domain. We adopt a similar setting with [25], yet without dilation and different kernel size, since our HyperConv deals with encoded positions and anthropometric features.…”
Section: Network Architecturementioning
confidence: 99%
See 1 more Smart Citation
“…The hyper-convolution layer is a linear module of which the weights and biases are generated from given conditions. In the work by [25,19], the hyper-convolution was used for conditioning the temporal convolutional network that operates in the time domain. We adopt a similar setting with [25], yet without dilation and different kernel size, since our HyperConv deals with encoded positions and anthropometric features.…”
Section: Network Architecturementioning
confidence: 99%
“…Recently, data-driven methods for binaural synthesis have been widely studied [16,17,18,19]. These methods inherently require a sufficiently large number of data points to estimate the distribution of interest.…”
Section: Introductionmentioning
confidence: 99%
“…Binaural audio synthesis has traditionally relied on signal processing techniques that model the physics of human spatial hearing as a linear time-invariant system [17][18][19][20]. More recently, there has been a line of studies on neural synthesis of binaural audio that showed the advantages of the data-driven approaches [2][3][4][21][22][23][24][25]. We will refer to these models, illustrated at the top right of Fig.…”
Section: Related Workmentioning
confidence: 99%
“…Although binaural synthesis has recently experienced a breakthrough based on neural audio rendering techniques [2][3][4] that allow to learn binauralization and spatial audio in a datadriven way, these approaches fall short in their ability to faithfully model environmental factors such as room reverb and noise floor. The reason these models fail to model stochastic processes is their reliance on direct reconstruction losses on waveforms.…”
Section: Introductionmentioning
confidence: 99%
“…In addition, this paper investigates an end-to-end neural network synthesis method that simulates the differences caused by subtle effects on the final output signal through a temporal convolutional neural network module. We conducted extensive experiments on the POP909 dataset [7] and HRFT data [8] to evaluate the effectiveness and generality of the proposed method.…”
Section: Introductionmentioning
confidence: 99%