2012 IEEE Spoken Language Technology Workshop (SLT) 2012
DOI: 10.1109/slt.2012.6424250
|View full text |Cite
|
Sign up to set email alerts
|

Active learning for accent adaptation in Automatic Speech Recognition

Abstract: We introduce a novel active learning algorithm for speech recognition in the context of accent adaptation. We adapt a source recognizer on the target accent by selecting a matched subset of utterances from a large, untranscribed and multiaccented corpus for human transcription. Traditionally, active learning in speech recognition has relied on uncertainty based sampling to choose the most informative samples for manual labeling. Such an approach doesn't include explicit relevance criterion during data selectio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(7 citation statements)
references
References 13 publications
0
7
0
Order By: Relevance
“…More recently, [28] Polyphone Decision Trees (PDTs) have been used to model contextual acoustic variants in multiaccented Arabic speech, where PDT adaptation obtained 7% relative WER reduction compared with maximum a posteriori (MAP) [29] accent adaptation, on the Broadcast Conversations (BC) part of LDC GALE corpus. In another study [30] PDT adaptation achieved 13.9% relative improvement in WER compared with accent-specific MAP adaptation. A similar study has been conducted for variations of South African English [31].…”
Section: Pronunciation Modellingmentioning
confidence: 95%
“…More recently, [28] Polyphone Decision Trees (PDTs) have been used to model contextual acoustic variants in multiaccented Arabic speech, where PDT adaptation obtained 7% relative WER reduction compared with maximum a posteriori (MAP) [29] accent adaptation, on the Broadcast Conversations (BC) part of LDC GALE corpus. In another study [30] PDT adaptation achieved 13.9% relative improvement in WER compared with accent-specific MAP adaptation. A similar study has been conducted for variations of South African English [31].…”
Section: Pronunciation Modellingmentioning
confidence: 95%
“…These approaches resemble approaches for acoustic adaptation of VQ codebooks (discussed in section III), in that they learn an accent-specific transition matrix between the phonemic symbols in the dictionary. Selection of utterances for accent adaptation has been explored, with Nallasamy et al [211] proposing an active learning approach.…”
Section: Accent Adaptationmentioning
confidence: 99%
“…Active learning for speech recognition aims at identifying the most informative utterances to be manually transcribed from a large pool of unlabeled speech. This topic has been extensively explored on a number of different fronts, including the use of uncertainty-based sampling to select informative speech samples [6,7,8,9], active learning for low-resource speech recognition [10,11], combined active and semi-supervised learning [12] and active learning for arXiv:2103.03142v1 [cs.SD] 4 Mar 2021 end-to-end ASR systems [13,14]. In active learning, the goal is to select informative speech samples that are subsequently transcribed, while our work focuses on the reverse problem of selecting informative sentences that are subsequently recorded as speech.…”
Section: Related Workmentioning
confidence: 99%
“…Existing work on selecting sentences is surprisingly limited to only strategies like enforcing phonetic or word diversity among a selected set of sentences [3,4,5]. In contrast, the re-verse problem of selecting utterances to transcribe from an existing unlabeled utterance corpus is called the active learning problem and has been extensively studied [6,7,8,9,10,11]. Our problem is better motivated to the task of personalizing to diverse user accents where large unlabeled utterances are non-existent, and labeled data has to be collected by recording utterances on selected sentences.…”
Section: Introductionmentioning
confidence: 99%