Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks

Tang, Zhenyu; Kanu, John; Hogan, Kevin; Manocha, Dinesh

doi:10.21437/interspeech.2019-1111

Cited by 37 publications

(25 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Tang et al found significant performance increases on an automatic speech recognition and keyword spotting task in [15] by using an acoustic simulation method that includes diffuse reflections. Using the same method, Tang et al also observed improved performance at a DOA estimation task [14].…”

Section: Introductionmentioning

confidence: 94%

See 1 more Smart Citation

Synthetic Data For Dnn-Based Doa Estimation of Indoor Speech

Gelderblom

Liu

Kvam

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

This paper investigates the use of different room impulse response (RIR) simulation methods for synthesizing training data for deep neural network-based direction of arrival (DOA) estimation of speech in reverberant rooms.Different sets of synthetic RIRs are obtained using the image source method (ISM) and more advanced methods including diffuse reflections and/or source directivity. Multi-layer perceptron (MLP) deep neural network (DNN) models are trained on generalized cross correlation (GCC) features extracted for each set. Finally, models are tested on features obtained from measured RIRs.This study shows the importance of training with RIRs from directive sources, as resultant DOA models achieved up to 51% error reduction compared to the steered response power with phase transform (SRP-PHAT) baseline (significant with p << .01), while models trained with RIRs from omnidirectional sources did worse than the baseline. The performance difference was specifically present when estimating the azimuth of speakers not facing the array directly.

show abstract

Section: Introductionmentioning

confidence: 94%

“…Inspired by the success of DNNs in many fields, several such approaches have been proposed for sound/speech source localisation (SSL) [7,8,9,10,11,12,13,14].…”

Section: Introductionmentioning

confidence: 99%

Synthetic Data For Dnn-Based Doa Estimation of Indoor Speech

Gelderblom

Liu

Kvam

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…This allows us to compute both early reflections and late reverberation efficiently. One speech-related problem that has benefited from more accurate simulations is the direction-of-arrival estimation task [37]. We argue that using a more accurate geometric acoustic simulation that faithfully models the late reverberation for general speech-related training will lead to better performance in learning-based models.…”

Section: Diffuse Acoustic Simulationmentioning

confidence: 99%

Improving Reverberant Speech Training Using Diffuse Acoustic Simulation

Tang

Chen

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

We present an efficient and realistic geometric sound simulation approach for generating and augmenting training data in speech-related machine learning tasks. Our physically based acoustic simulation method is capable of modeling occlusion, specular and diffuse reflections of sound in complicated acoustic environments, whereas the classical image method can only model specular reflections in simple room settings. We show that by using our synthetic training data, the same models gain significant performance improvement on real test sets in both speech recognition and keyword spotting tasks, without fine tuning using any real data.

show abstract

“…Similarly, Bryan estimates the T 60 and the direct-toreverberant ratio (DRR) from a single speech recording via augmented datasets [5]. Tang et al trained CRNN models purely based on synthetic spatial IRs that generalize to real-world recordings [60]. We strategically design an augmentation scheme to address the challenge of equalization's dependence on both IRs and speaker voice profiles, which is fully complimentary to all prior data-driven methods.…”

Section: Related Workmentioning

confidence: 99%

Scene-Aware Audio Rendering via Deep Acoustic Analysis

Tang

Bryan

et al. 2020

IEEE Trans. Visual. Comput. Graphics

Self Cite

View full text Add to dashboard Cite

Fig. 1: Given a natural sound in a real-world room that is recorded using a cellphone microphone (left), we estimate the acoustic material properties and the frequency equalization of the room using a novel deep learning approach (middle). We use the estimated acoustic material properties for generating plausible sound effects in the virtual model of the room (right). Our approach is general and robust, and works well with commodity devices.Abstract-We present a new method to capture the acoustic characteristics of real-world rooms using commodity devices, and use the captured characteristics to generate similar sounding sources with virtual models. Given the captured audio and an approximate geometric model of a real-world room, we present a novel learning-based method to estimate its acoustic material properties. Our approach is based on deep neural networks that estimate the reverberation time and equalization of the room from recorded audio. These estimates are used to compute material properties related to room reverberation using a novel material optimization objective. We use the estimated acoustic material characteristics for audio rendering using interactive geometric sound propagation and highlight the performance on many real-world scenarios. We also perform a user study to evaluate the perceptual similarity between the recorded sounds and our rendered audio.

show abstract

Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks

Cited by 37 publications

References 21 publications

Synthetic Data For Dnn-Based Doa Estimation of Indoor Speech

Synthetic Data For Dnn-Based Doa Estimation of Indoor Speech

Improving Reverberant Speech Training Using Diffuse Acoustic Simulation

Scene-Aware Audio Rendering via Deep Acoustic Analysis

Contact Info

Product

Resources

About