Robust ASR using neural network based speech enhancement and feature simulation

Sivasankaran, S.; Nugraha, Aditya Arie; Morales-Cordovilla, Juan A.; Dalmia, Siddharth; Illina, Irina

doi:10.1109/asru.2015.7404834

Cited by 31 publications

(49 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…More work on ground truth estimation is required to close this gap and benefit from simulated training data. In addition, training on real data now leads to a performance decrease on simulated data, while Sivasankaran et al (2015) found it to consistently improve performance on both real and simulated data. Along with the recent results of Nugraha et al (2016b) on another dataset, this suggests that, although weighted EM made little difference for spectral models other than DNN (Liutkus et al, 2015), weighted EM outperforms exact EM for the estimation of multichannel statistics from DNN outputs.…”

Section: Impact Of Ground Truth Estimationmentioning

confidence: 99%

“…One possible explanation for the difference observed when training the enhancement DNN of Sivasankaran et al (2015) on real vs. simulated data may be the way the ground truth is estimated rather than the data themselves. Indeed, as shown in Section 2.3.3, the spectrograms of real and simulated data appear to be similar, while the underlying ground truth speech signals, which are estimated from noisy and close-talk signals in the case of real data, look quite different.…”

Section: Impact Of Ground Truth Estimationmentioning

confidence: 99%

“…In the following, we do not discuss DNN post-filters, which provided a limited improvement or degradation on both real and simulated data (Hori et al, 2015;Sivasankaran et al, 2015), and we focus on multichannel DNN-based enhancement instead. Table 5 illustrates the performance of the DNN-based time-invariant generalized eigenvalue (GEV) beamformer proposed by Heymann et al (2015).…”

Section: Dnn-based Beamforming and Separationmentioning

confidence: 99%

“…These results also indicate that the enhancement system is able to leverage the simulated data to learn about the real data and that increasing the amount and variety of simulated data further improves performance. Sivasankaran et al (2015) exploited a DNN to perform multichannel timevarying Wiener filtering instead. The desired DNN outputs are the magnitude spectra of speech and noise, which are computed from the underlying clean speech signals in the case of simulated data or using the procedure described in Section 2.3.1 in the case of real data.…”

Section: Dnn-based Beamforming and Separationmentioning

confidence: 99%

“…We performed this experiment using the DNN-based multichannel source separation technique of Nugraha et al (2016a), which is a variant of the one of Sivasankaran et al (2015) that relies on exact EM updates for the spatial covariance matrices (Duong et al, 2010) instead of the weighted EM updates of Liutkus et al (2015).…”

Section: Impact Of Ground Truth Estimationmentioning

confidence: 99%

See 4 more Smart Citations

An analysis of environment, microphone and data simulation mismatches in robust speech recognition

Vincent

Watanabe

Nugraha

et al. 2017

Computer Speech & Language

Self Cite

292

177

View full text Add to dashboard Cite

Section: Impact Of Ground Truth Estimationmentioning

confidence: 99%

Section: Impact Of Ground Truth Estimationmentioning

confidence: 99%

Section: Dnn-based Beamforming and Separationmentioning

confidence: 99%

Section: Dnn-based Beamforming and Separationmentioning

confidence: 99%

Section: Impact Of Ground Truth Estimationmentioning

confidence: 99%

See 3 more Smart Citations

An analysis of environment, microphone and data simulation mismatches in robust speech recognition

Vincent

Watanabe

Nugraha

et al. 2017

Computer Speech & Language

Self Cite

292

177

View full text Add to dashboard Cite

Multichannel Clustering and Classification Approaches

Mandel

Araki

Nakatani

2018

Audio Source Separation and Speech Enhancement

View full text Add to dashboard Cite

The CHiME Challenges: Robust Speech Recognition in Everyday Environments

Barker

Marxer

Watanabe

2017

New Era for Robust Speech Recognition

Self Cite

View full text Add to dashboard Cite

The CHiME challenge series has been aiming to advance the development of robust automatic speech recognition for use in everyday environments by encouraging research at the interface of signal processing and statistical modelling. The series has been running since 2011 and is now entering its 4th iteration. This chapter provides an overview of the CHiME series including a description of the datasets that have been collected and the tasks that have been defined for each edition. In particular the chapter describes novel approaches that have been developed for producing simulated data for system training and evaluation, and conclusions about the validity of using simulated data for robust speech recognition development. We also provide a brief overview of the systems and specific techniques that have proved successful for each task. These systems have demonstrated the remarkable robustness that can be achieved through a combination of training data simulation and multicondition training, well-engineered multichannel enhancement and state-of-the-art discriminative acoustic and language modelling techniques.

show abstract

Robust ASR using neural network based speech enhancement and feature simulation

Cited by 31 publications

References 39 publications

An analysis of environment, microphone and data simulation mismatches in robust speech recognition

An analysis of environment, microphone and data simulation mismatches in robust speech recognition

Multichannel Clustering and Classification Approaches

The CHiME Challenges: Robust Speech Recognition in Everyday Environments

Contact Info

Product

Resources

About