2017
DOI: 10.1121/1.5011159
|View full text |Cite
|
Sign up to set email alerts
|

A transfer learning approach to goodness of pronunciation based automatic mispronunciation detection

Abstract: Goodness of pronunciation (GOP) is the most widely used method for automatic mispronunciation detection. In this paper, a transfer learning approach to GOP based mispronunciation detection when applying maximum F1-score criterion (MFC) training to deep neural network (DNN)-hidden Markov model based acoustic models is proposed. Rather than train the whole network using MFC, a DNN is used, whose hidden layers are borrowed from native speech recognition with only the softmax layer trained according to the MFC obj… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
11
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 23 publications
(11 citation statements)
references
References 34 publications
0
11
0
Order By: Relevance
“…Pronunciation is a general term that includes several distinct features present in human languages. The correct pronunciation is hard and challenging to measure because there is no universal definition for correctness in the context of human languages, but there has been some research in this domain [5,6].…”
Section: Arabic Phonemes and Their Pronunciationmentioning
confidence: 99%
“…Pronunciation is a general term that includes several distinct features present in human languages. The correct pronunciation is hard and challenging to measure because there is no universal definition for correctness in the context of human languages, but there has been some research in this domain [5,6].…”
Section: Arabic Phonemes and Their Pronunciationmentioning
confidence: 99%
“…Recently, progress in deep learning for ASR [8] triggered new work on applying deep neural networks (DNNs) for pronunciation scoring [9,10,11,12,13], obtaining improvements over traditional methods for both the above-mentioned groups. Notably, methods of the second group usually rely on transfer learning approaches to mitigate the problem of data scarcity.…”
Section: Introductionmentioning
confidence: 99%
“…However, the DNN-HMM models involved with subphonemic (senone) posterior probabilities and those cannot be mapped directly with HMM state transition probabilities because each senone is shared across many states [19]. Due to this, DNN-HMM based formulations introduced variants to the GoP without considering transition probabilities [17,[20][21][22]. Wenping et al computed the scores using senone posterior probability [17].…”
Section: Introductionmentioning
confidence: 99%
“…They also proposed another score by including the senone prior probabilities [20]. Further, the score as well as the features computed based on these score were used in the mispronunciation detection [23] considering transfer learning approach and pronunciation evaluation [22].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation