Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-464
|View full text |Cite
|
Sign up to set email alerts
|

Improving Mispronunciation Detection for Non-Native Learners with Multisource Information and LSTM-Based Deep Models

Abstract: In this paper, we utilize manner and place of articulation features and deep neural network models (DNNs) with long short-term memory (LSTM) to improve the detection performance of phonetic mispronunciations produced by second language learners. First, we show that speech attribute scores are complementary to conventional phone scores, so they can be concatenated as features to improve a baseline system based only on phone information. Next, pronunciation representation, usually calculated by frame-level avera… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
3
1

Relationship

2
7

Authors

Journals

citations
Cited by 24 publications
(6 citation statements)
references
References 21 publications
0
6
0
Order By: Relevance
“…Detection of mispronunciation has also been achieved by training a binary classifier to learn the decision boundary between correct and erroneous pronunciations. In [6,7], deep neural network (DNN) models have been applied to improve mispronunciation detection. Despite the high performance demonstrated, the training of DNN classifiers requires a large amount of disordered speech.…”
Section: Introductionmentioning
confidence: 99%
“…Detection of mispronunciation has also been achieved by training a binary classifier to learn the decision boundary between correct and erroneous pronunciations. In [6,7], deep neural network (DNN) models have been applied to improve mispronunciation detection. Despite the high performance demonstrated, the training of DNN classifiers requires a large amount of disordered speech.…”
Section: Introductionmentioning
confidence: 99%
“…The LR classifier on top of the F-GOP algorithm outperformed the baseline due to the possibility to add informative features as input to the classifier. Mispronunciation at the phoneme level has been studied before using DNN models and Long Short Term Memory (LSTM) [10]. Various methods such as log-likelihood ratio [11] and GOP [12] were adopted previously.…”
Section: Related Workmentioning
confidence: 99%
“…It is well-known that the L2 learning process is heavily affected by a well-established habitual perception of phonemes and articulatory motions in the learners' primary language (L1) [1], which often cause mistakes and imprecise articulation in speech productions by the L2 learners, e.g., a negative language transfer [1,2]. As a feasible tool, computer assisted pronunciation training (CAPT) is often employed to automatically assess L2 learners' pronunciation quality at different levels, e.g., phonelevel [3][4][5][6][7][8][9][10][11][12], word-level [13][14][15][16][17] and sentence-level [18][19][20][21][22].…”
Section: Introductionmentioning
confidence: 99%
“…Alternatively, the two-step approaches treat pronunciation scoring or mispronunciation detection as regression or classification task. Specifically, phone, word and sentence boundaries are first generated by forced-alignment, and then either frame-level or segmental-level pronunciation features within each boundary are fed into task-dependent classifiers or regressors (e.g., [6,8,12,[15][16][17][18][19][20][21][22]). Finally, the posterior probabilities or predicted values obtained from those models are often used as pronunciation scores.…”
Section: Introductionmentioning
confidence: 99%