Analyzing Uncertainties in Speech Recognition Using Dropout

Vyas, Apoorv; Dighe, Pranay; Tong, Sibo; Bourlard, Hervé

doi:10.1109/icassp.2019.8683086

Cited by 16 publications

(10 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Here, a stochastic pass refers to inference with a dropout realization. This technique is closely related to [24], which uses a model's prediction uncertainty computed using dropout to estimate word error rates. DUST combines ST and pseudo-label filtering based on the ASR model's uncertainty for an unlabeled speech utterance using dropout.…”

Section: Dustmentioning

confidence: 99%

“…In this case, the pseudo-labels generated by the teacher model for the unlabeled target domain data may be less accurate, which increases the need to apply a pseudo-label filtering strategy. To that end, we propose dropout-based uncertaintydriven self-training (DUST), which filters pseudo-labeled data based on the model's uncertainty about its prediction as measured using the degree of agreement between multiple transcriptions obtained with various realizations of dropout and a reference transcription obtained without dropout [23,24]. We show that DUST is an effective method for mismatched domain adaptation and substantially improves over the baseline model, which is trained on the source domain labeled data only, as well as over iterative ST without filtering [19], whereby the largest gain is observed when the source and target domain mismatch is most severe.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training

Khurana

Moritz

Hori

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

The performance of automatic speech recognition (ASR) systems typically degrades significantly when the training and test data domains are mismatched. In this paper, we show that self-training (ST) combined with an uncertainty-based pseudo-label filtering approach can be effectively used for domain adaptation. We propose DUST, a dropout-based uncertainty-driven self-training technique which uses agreement between multiple predictions of an ASR system obtained for different dropout settings to measure the model's uncertainty about its prediction. DUST excludes pseudo-labeled data with high uncertainties from the training, which leads to substantially improved ASR results compared to ST without filtering, and accelerates the training time due to a reduced training data set. Domain adaptation experiments using WSJ as a source domain and TED-LIUM 3 as well as SWITCHBOARD as the target domains show that up to 80% of the performance of a system trained on ground-truth data can be recovered.

show abstract

Section: Dustmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training

Khurana

Moritz

Hori

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…Furthermore, they employ ensembles as their primary method of confidence estimation, while we also evaluate temperature scaling and dropout methods. Dropout was previously used for obtaining confidence scores for ASR [34], but our approaches differ: in [34] multiple hypotheses are generated via dropout and then word confidences are assigned based on their frequency of appearance in the aligned hypothe-ses; in contrast, we aggregate the posterior probabilities and not the hypotheses, which simplifies the procedure as it avoids the alignment step.…”

Section: Related Workmentioning

confidence: 99%

An evaluation of word-level confidence estimation for end-to-end automatic speech recognition

Oneață

Caranica

Stan

et al. 2021

Preprint

View full text Add to dashboard Cite

Quantifying the confidence (or conversely the uncertainty) of a prediction is a highly desirable trait of an automatic system, as it improves the robustness and usefulness in downstream tasks. In this paper we investigate confidence estimation for end-toend automatic speech recognition (ASR). Previous work has addressed confidence measures for lattice-based ASR, while current machine learning research mostly focuses on confidence measures for unstructured deep learning. However, as the ASR systems are increasingly being built upon deep end-to-end methods, there is little work that tries to develop confidence measures in this context. We fill this gap by providing an extensive benchmark of popular confidence methods on four well-known speech datasets. There are two challenges we overcome in adapting existing methods: working on structured data (sequences) and obtaining confidences at a coarser level than the predictions (words instead of tokens). Our results suggest that a strong baseline can be obtained by scaling the logits by a learnt temperature, followed by estimating the confidence as the negative entropy of the predictive distribution and, finally, sum pooling to aggregate at word level.

show abstract

“…We also propose to localize the uncertainty of the end-toend ASR by applying dropout mechanism. This method is motivated by the recent advances of DNN for measuring the reliability of the model [14,15]. The conventional methods do not use dropouts during the decoding time.…”

Section: Semi Supervised Learningmentioning

confidence: 99%

Exploiting Semi-Supervised Training Through a Dropout Regularization in End-to-End Speech Recognition

et al. 2019

View full text Add to dashboard Cite

In this paper, we explore various approaches for semisupervised learning in an end-to-end automatic speech recognition (ASR) framework. The first step in our approach involves training a seed model on the limited amount of labelled data. Additional unlabelled speech data is employed through a data-selection mechanism to obtain the best hypothesized output, further used to retrain the seed model. However, uncertainties of the model may not be well captured with a single hypothesis. As opposed to this technique, we apply a dropout mechanism to capture the uncertainty by obtaining multiple hypothesized text transcripts of an speech recording. We assume that the diversity of automatically generated transcripts for an utterance will implicitly increase the reliability of the model. Finally, the data-selection process is also applied on these hypothesized transcripts to reduce the uncertainty. Experiments on freely-available TEDLIUM corpus and proprietary Adobe's internal dataset show that the proposed approach significantly reduces ASR errors, compared to the baseline model.

show abstract

Analyzing Uncertainties in Speech Recognition Using Dropout

Cited by 16 publications

References 16 publications

Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training

Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training

An evaluation of word-level confidence estimation for end-to-end automatic speech recognition

Exploiting Semi-Supervised Training Through a Dropout Regularization in End-to-End Speech Recognition

Contact Info

Product

Resources

About