Active learning for accent adaptation in Automatic Speech Recognition

Nallasamy, Udhyakumar; Metze, Florian; Schultz, Tanja

doi:10.1109/slt.2012.6424250

Cited by 13 publications

(7 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…More recently, [28] Polyphone Decision Trees (PDTs) have been used to model contextual acoustic variants in multiaccented Arabic speech, where PDT adaptation obtained 7% relative WER reduction compared with maximum a posteriori (MAP) [29] accent adaptation, on the Broadcast Conversations (BC) part of LDC GALE corpus. In another study [30] PDT adaptation achieved 13.9% relative improvement in WER compared with accent-specific MAP adaptation. A similar study has been conducted for variations of South African English [31].…”

Section: Pronunciation Modellingmentioning

confidence: 95%

Automatic accent identification as an analytical tool for accent robust automatic speech recognition

Najafian

Russell

2020

Speech Communication

View full text Add to dashboard Cite

Section: Pronunciation Modellingmentioning

confidence: 95%

Automatic accent identification as an analytical tool for accent robust automatic speech recognition

Najafian

Russell

2020

Speech Communication

View full text Add to dashboard Cite

“…These approaches resemble approaches for acoustic adaptation of VQ codebooks (discussed in section III), in that they learn an accent-specific transition matrix between the phonemic symbols in the dictionary. Selection of utterances for accent adaptation has been explored, with Nallasamy et al [211] proposing an active learning approach.…”

Section: Accent Adaptationmentioning

confidence: 99%

Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview

Bell

Fainberg

Klejch

et al. 2021

IEEE Open J. Signal Process.

View full text Add to dashboard Cite

“…Active learning for speech recognition aims at identifying the most informative utterances to be manually transcribed from a large pool of unlabeled speech. This topic has been extensively explored on a number of different fronts, including the use of uncertainty-based sampling to select informative speech samples [6,7,8,9], active learning for low-resource speech recognition [10,11], combined active and semi-supervised learning [12] and active learning for arXiv:2103.03142v1 [cs.SD] 4 Mar 2021 end-to-end ASR systems [13,14]. In active learning, the goal is to select informative speech samples that are subsequently transcribed, while our work focuses on the reverse problem of selecting informative sentences that are subsequently recorded as speech.…”

Section: Related Workmentioning

confidence: 99%

“…Existing work on selecting sentences is surprisingly limited to only strategies like enforcing phonetic or word diversity among a selected set of sentences [3,4,5]. In contrast, the re-verse problem of selecting utterances to transcribe from an existing unlabeled utterance corpus is called the active learning problem and has been extensively studied [6,7,8,9,10,11]. Our problem is better motivated to the task of personalizing to diverse user accents where large unlabeled utterances are non-existent, and labeled data has to be collected by recording utterances on selected sentences.…”

Section: Introductionmentioning

confidence: 99%

Error-Driven Fixed-Budget ASR Personalization for Accented Speakers

Awasthi

Kansal

Sarawagi

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

We consider the task of personalizing ASR models while being constrained by a fixed budget on recording speaker specific utterances. Given a speaker and an ASR model, we propose a method of identifying sentences for which the speaker's utterances are likely to be harder for the given ASR model to recognize. We assume a tiny amount of speakerspecific data to learn phoneme-level error models which help us select such sentences. We show that speaker's utterances on the sentences selected using our error model indeed have larger error rates when compared to speaker's utterances on randomly selected sentences. We find that fine-tuning the ASR model on the sentence utterances selected with the help of error models yield higher WER improvements in comparison to fine-tuning on an equal number of randomly selected sentence utterances. Thus, our method provides an efficient way of collecting speaker utterances under budget constraints for personalizing ASR models.

show abstract

Active learning for accent adaptation in Automatic Speech Recognition

Cited by 13 publications

References 13 publications

Automatic accent identification as an analytical tool for accent robust automatic speech recognition

Automatic accent identification as an analytical tool for accent robust automatic speech recognition

Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview

Error-Driven Fixed-Budget ASR Personalization for Accented Speakers

Contact Info

Product

Resources

About