Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-1602
|View full text |Cite
|
Sign up to set email alerts
|

Mispronunciation Detection Leveraging Maximum Performance Criterion Training of Acoustic Models and Decision Functions

Abstract: Mispronunciation detection is part and parcel of a computer assisted pronunciation training (CAPT) system, facilitating second-language (L2) learners to pinpoint erroneous pronunciations in a given utterance so as to improve their spoken proficiency. This paper presents a continuation of such a general line of research and the major contributions are twofold. First, we present an effective training approach that estimates the deep neural network based acoustic models involved in the mispronunciation detection … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
5
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 12 publications
(6 citation statements)
references
References 22 publications
(25 reference statements)
1
5
0
Order By: Relevance
“…A direct comparison of our F1 score with existing work is not possible due to the difference in audio data characteristics. However, as a rough comparison, the current state-of-the art on miscue detection tasks [7,8] report an F1 score of about 0.6 to 0.65, which is comparable to our score of 0.819 on augmented data. However, for real data we obtained a slightly lower F1 score of 0.512.…”
Section: Discussionsupporting
confidence: 81%
See 1 more Smart Citation
“…A direct comparison of our F1 score with existing work is not possible due to the difference in audio data characteristics. However, as a rough comparison, the current state-of-the art on miscue detection tasks [7,8] report an F1 score of about 0.6 to 0.65, which is comparable to our score of 0.819 on augmented data. However, for real data we obtained a slightly lower F1 score of 0.512.…”
Section: Discussionsupporting
confidence: 81%
“…To create artificial data with miscues, we inspected existing literature on the different types of miscues. Unlike existing works on binary miscue detection task [7,8], we need miscue types that occur in natural speech of reading. Since our goal is to classify miscue types for reading corpus at the phoneme level, we selected miscue types [9] that are applicable at the phoneme level.…”
Section: Introductionmentioning
confidence: 99%
“…The paper uses phoneme-level statistics and also does prosody analysis by prosody feature extraction. The work in [16] estimates acoustic models by leveraging deep neural network. All the above methods mentioned above require sufficient amount of speech recordings for training the models for efficient mispronunciation detection.…”
Section: Related Workmentioning
confidence: 99%
“…In Eqs. (30) and (31), the occupation-data, sum-of-data, and sum-of-square-data of the "numerator" are b…”
Section: Transfer Learning Based Gop Using Gmm-hmmsmentioning
confidence: 99%