Automatic Speech Recognition Using Missing Data Techniques: Handling of Real-World Data

Gemmeke, Jort F.; Segbroeck, Maarten Van; Wang, Yujun; Cranen, B.; hamme, Hugo Van

doi:10.1007/978-3-642-21317-5_7

Cited by 5 publications

(6 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…First, while it has been shown MDT can be used to combat reverberation [16,25], to date no method has been presented that enables the estimation of reverberation-dominated features in noisy environments. Second, future work will address the poor performance of current mask estimation methods on speech corrupted by background music, a prevailing problem in searching audion archives.…”

Section: Discussionmentioning

confidence: 99%

“…All speech material was simultaneously recorded with four different microphones (channels) at increasing distances, resulting in utterances corrupted by varying levels of noise. We used the method described in [16] to obtain SNR estimates of all utterances.…”

Section: Real-world Data: the Speecon And Speechdat-car Databasesmentioning

confidence: 99%

“…In Sect. 16.3 we describe the sparse imputation method and the AURORA-2 and Finnish SPEECON [21] databases used for evaluations, and in Sect. 16.…”

Section: Introductionmentioning

confidence: 99%

“…16.3 we describe the sparse imputation method and the AURORA-2 and Finnish SPEECON [21] databases used for evaluations, and in Sect. 16. 4 we describe and discuss the recognition accuracies that were obtained.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Missing Data Solutions for Robust Speech Recognition

Wang

Gemmeke

Demuynck

et al. 2012

Essential Speech and Language Technology for Dutch

Self Cite

View full text Add to dashboard Cite

One of the major concerns when deploying speech recognition applications is the lack of robustness of the technology. Humans are robust to noise, different acoustic environments, pronunciation variation, ungrammatical sentences, incomplete utterances, filled pauses, stutters, etc. and this engenders the same expectation for automatic systems. In this contribution we discuss an approach called missing data techniques (MDT) [3,27] to deal with one of these problems: noise robustness. Unlike many previously proposed solutions, MDT can deal with noise exhibiting rapidly changing characteristics, which is often the case in practical deployments. For example, a mobile device used in a city will pick up the noise of cars passing by, of construction sites, from car horns, of people talking or shouting, etc.In a nutshell, MDT is based on the idea that even in noisy speech, some of the features describing the speech signal remain uncorrupted. The goal is to identify the corrupted (missing) features and to then replace them (impute) with clean speech estimates. In this contribution we describe the research carried out in the MIDAS project, which focussed on two aspects of MDT. First, we discuss an novel imputation method to derive clean speech estimates of the corrupted noise speech features, a method dubbed Sparse Imputation. This method models speech as a linear combination of exemplars, segments of speech, rather than modelling speech using a statistical model. Second, we describe how a state-of-the-art large vocabulary automatic speech recognition (ASR) system based on the prevailing hidden Markov model (HMM) can be made noise robust using conventional MDT.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Real-world Data: the Speecon And Speechdat-car Databasesmentioning

confidence: 99%

“…In Sect. 16.3 we describe the sparse imputation method and the AURORA-2 and Finnish SPEECON [21] databases used for evaluations, and in Sect. 16.…”

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Missing Data Solutions for Robust Speech Recognition

Wang

Gemmeke

Demuynck

et al. 2012

Essential Speech and Language Technology for Dutch

Self Cite

View full text Add to dashboard Cite

show abstract

“…In order to show the speed improvement of MC MDT over a full MDT system [45], i.e., where the CLSQ problem (11) is solved per Gaussian with GD, an acoustic model with Gaussians estimated on PROSPECT features is required. The model has 21,037 PROSPECT Gaussians which are obtained by Single Pass Retraining (SPR) [46] of the acoustic model with MIDA features.…”

Section: Training Backend Prospect Modelmentioning

confidence: 99%

Multi-candidate missing data imputation for robust speech recognition

Wang

hamme

2012

J AUDIO SPEECH MUSIC PROC.

Self Cite

View full text Add to dashboard Cite

The application of Missing Data Techniques (MDT) to increase the noise robustness of HMM/GMM-based large vocabulary speech recognizers is hampered by a large computational burden. The likelihood evaluations imply solving many constrained least squares (CLSQ) optimization problems. As an alternative, researchers have proposed frontend MDT or have made oversimplifying independence assumptions for the backend acoustic model. In this article, we propose a fast Multi-Candidate (MC) approach that solves the per-Gaussian CLSQ problems approximately by selecting the best from a small set of candidate solutions, which are generated as the MDT solutions on a reduced set of cluster Gaussians. Experiments show that the MC MDT runs equally fast as the uncompensated recognizer while achieving the accuracy of the full backend optimization approach. The experiments also show that exploiting the more accurate acoustic model of the backend does pay off in terms of accuracy when compared to frontend MDT.

show abstract