Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1135
|View full text |Cite
|
Sign up to set email alerts
|

Joint Noise and Reverberation Adaptive Learning for Robust Speaker DOA Estimation with an Acoustic Vector Sensor

Abstract: Deep neural network (DNN) based DOA estimation (DNN-DOAest) methods report superior performance but the degradation is observed under stronger additive noise and room reverberation conditions. Motivated by our previous work with an acoustic vector sensor (AVS) and the great success of DNN based speech denoising and dereverberation (DNN-SDD), a unified DNN framework for robust DOA estimation task is thoroughly investigated in this paper. First, a novel DOA cue termed as sub-band inter-sensor data ratio (Sb-ISDR… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 27 publications
0
2
0
Order By: Relevance
“…The experiments were conducted using speech (English for training and Japanese for testing) mixed with everyday sounds (office printer background or household noise) to train and test the NN for both static and moving speech sources. Wang et al [49] propose the use of an Acoustic Vector Sensor (AVS) to estimate DoA, in conjunction with a network for denoising and dereverberation. The authors' hypothesis is that clean features are better classified than unclean ones, therefore they used a DNN for Signal Denoising and Dereverberation (DNN-SDD), which maps noise and reverberant speech features to their clean versions and uses them as input for a DNN that calculates DoA.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The experiments were conducted using speech (English for training and Japanese for testing) mixed with everyday sounds (office printer background or household noise) to train and test the NN for both static and moving speech sources. Wang et al [49] propose the use of an Acoustic Vector Sensor (AVS) to estimate DoA, in conjunction with a network for denoising and dereverberation. The authors' hypothesis is that clean features are better classified than unclean ones, therefore they used a DNN for Signal Denoising and Dereverberation (DNN-SDD), which maps noise and reverberant speech features to their clean versions and uses them as input for a DNN that calculates DoA.…”
Section: Related Workmentioning
confidence: 99%
“…CNNs combined with Long Short-Term Memory (LSTM) [29] have been shown to be useful for estimating DoA by using Generalized Cross-Correlation Phase Transform (GCC-PHAT) as input data. Some approaches use neural networks to perform pre-processing such as time-frequency (TF) masking [36], [51], [52] or denoising and dereverberation [49].…”
Section: Introductionmentioning
confidence: 99%