Synthetic Data For Dnn-Based Doa Estimation of Indoor Speech

Gelderblom, Femke B.; Liu, Yi; Kvam, Johannes; Myrvoll, Tor André

doi:10.1109/icassp39728.2021.9414415

Cited by 9 publications

(5 citation statements)

References 16 publications

(26 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We relied on the DNS Challenge 2021 speech and noise data, as it is a high quality database that covers multiple languages and many different types of noises. For the RIRs, we used the ISM-dir dataset described in [30]. These RIRs are simulated using the image source method with the addition that all speaker sources are modelled as directive sources with an average male/female speaker pattern directivity.…”

Section: Training Datamentioning

confidence: 99%

See 1 more Smart Citation

On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks

Gelderblom,

Tronstad,

Svendsen

et al. 2024

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

Speech enhancement (SE) systems aim to improve the quality and intelligibility of degraded speech signals obtained from far-field microphones. Subjective evaluation of the intelligibility performance of these SE systems is uncommon. Instead, objective intelligibility measures (OIMs) are generally used to predict subjective performance increases. Many recent deep learning (DL) based SE systems, are expected to improve the intelligibility of degraded speech as measured by OIMs.However, validation of the ability of these OIMs to predict subjective intelligibility when enhancing a speech signal using DL-based systems is lacking. Therefore, in this study, we evaluate the predictive performance of five popular OIMs. We compare the metrics' predictions with subjective results. For this purpose, we recruited 50 human listeners, and subjectively tested both single channel and multi-channel Deep Complex Convolutional Recurrent Network (DCCRN) based speech enhancement systems.We found that none of the OIMs gave reliable predictions, and that all OIMs overestimated the intelligibility of 'enhanced' speech signals.

show abstract

Section: Training Datamentioning

confidence: 99%

“…RIRs were then recorded with the same microphone array in the same room, at various speaker positions and orientations. More details on how these RIRs were obtained can be found in [30]. We included both the RIR recordings for speakers facing the array, and the RIRs for speakers facing away at a 90 degree angle.…”

Section: Evaluation a Evaluation Datamentioning

confidence: 99%

On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks

Gelderblom,

Tronstad,

Svendsen

et al. 2024

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

show abstract

“…We relied on the DNS Challenge 2021 speech and noise data, as it is a high quality database that covers multiple languages and many different types of noises. For the RIRs, we used the ISM-dir dataset described in [27]. These RIRs are simulated using the image source method with the addition that all speaker sources are modelled as directive sources with an average male/female speaker pattern directivity.…”

Section: Training Datamentioning

confidence: 99%

“…RIRs were then recorded with the same microphone array in the same room, at various speaker positions and orientations. More details on how these RIRs were obtained can be found in [27]. We included both the RIR recordings for speakers looking towards the array, and the RIRs for speakers looking away at a 90 degree angle.…”

Section: Evaluation a Evaluation Datamentioning

confidence: 99%

On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks

Gelderblom¹,

Tronstad²,

Svendsen³

et al. 2022

Preprint

View full text Add to dashboard Cite

<div>Speech enhancement (SE) systems aim to improve the quality and intelligibility of degraded speech signals obtained from far-field microphones. Subjective evaluation of the intelligibility performance of these SE systems is uncommon. Instead, objective intelligibility measures (OIMs) are generally used to predict subjective performance increases. Many recent deep learning based SE systems, are expected to improve the intelligibility of degraded speech as measured by OIMs. </div><div><br></div><div>However, validation of the OIMs for this purpose is lacking. Therefore, in this study, we evaluate the predictive performance of five popular OIMs. We compare the metrics' predictions with subjective results. For this purpose, we recruited 50 human listeners, and subjectively tested both single channel and multi-channel Deep Complex Convolutional Recurrent Network (DCCRN) based speech systems. </div><div><br></div><div>We find that none of the OIMs gave reliable predictions, and that all OIMs overestimated the intelligibility of `enhanced' speech signals. </div>

show abstract

“…This method shows similar performance compared to the usual ISM, while being computationally more efficient. An investigation of several simulation methods has been done in [234], with extensions of ISM, namely ISM with directional sources, and ISM with a diffuse field due to scattering. The authors of [234] compared the simulation algorithms via the training of an MLP (in both regression and classification modes) and showed that ISM with scattering effect and directional sources leads to the best SSL performance.…”

Section: A Synthetic Datamentioning

confidence: 99%

A Survey of Sound Source Localization with Deep Learning Methods

Grumiaux,

Kitić,

Girin

et al. 2021

Preprint

View full text Add to dashboard Cite

This article is a survey on deep learning methods for single and multiple sound source localization. We are particularly interested in sound source localization in indoor/domestic environment, where reverberation and diffuse noise are present. We provide an exhaustive topography of the neural-based localization literature in this context, organized according to several aspects: the neural network architecture, the type of input features, the output strategy (classification or regression), the types of data used for model training and evaluation, and the model training strategy. This way, an interested reader can easily comprehend the vast panorama of the deep learning-based sound source localization methods. Tables summarizing the literature survey are provided at the end of the paper for a quick search of methods with a given set of target characteristics.

show abstract

Synthetic Data For Dnn-Based Doa Estimation of Indoor Speech

Cited by 9 publications

References 16 publications

On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks

On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks

On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks

A Survey of Sound Source Localization with Deep Learning Methods

Contact Info

Product

Resources

About