Unsupervised Discovery of an Extended Phoneme Set in L2 English Speech for Mispronunciation Detection and Diagnosis

Mao, Shaoguang; Xu, Li; Li, Kun; Wu, Zhiyong; Liu, Xunying; Meng, Helen

doi:10.1109/icassp.2018.8462635

Cited by 12 publications

(11 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Non-categorical patterns acquisition framework proaches [9][10][11][12][13] proposed to capture the L2 pronunciation deviations from categorical phones. [9,12,13] try to capture each L2 error patterns of a given categorical phone, while [11] focuses on extending a native phone set to include the noncategorical patterns in L2 English speech. Actually, this work is an extension of [11].…”

Section: Introductionmentioning

confidence: 99%

“…[9,12,13] try to capture each L2 error patterns of a given categorical phone, while [11] focuses on extending a native phone set to include the noncategorical patterns in L2 English speech. Actually, this work is an extension of [11]. The distinction lies in 1) this approach extracts segment-level features to model the phonetic information while [11] uses non-segmental frame-level features and their average, thereby the accumulated errors are reduced and the discovered non-categories are more accurate with higher confusion degree; 2) this work uses a more simple but effective method to explore non-categories, while not involving k-means clustering method used in [11].…”

Section: Introductionmentioning

confidence: 99%

“…Actually, this work is an extension of [11]. The distinction lies in 1) this approach extracts segment-level features to model the phonetic information while [11] uses non-segmental frame-level features and their average, thereby the accumulated errors are reduced and the discovered non-categories are more accurate with higher confusion degree; 2) this work uses a more simple but effective method to explore non-categories, while not involving k-means clustering method used in [11]. This prevents a data imbalance problem and high-dimension data issues in k-means clustering, and hence discovers more noncategories.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Deep segmental phonetic posterior-grams based discovery of non-categories in L2 English speech

Li,

Wu,

Liu

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Second language (L2) speech is often labeled with the native, phone categories. However, in many cases, it is difficult to decide on a categorical phone that an L2 segment belongs to. These segments are regarded as non-categories. Most existing approaches for Mispronunciation Detection and Diagnosis (MDD) are only concerned with categorical errors, i.e. a phone category is inserted, deleted or substituted by another. However, non-categorical errors are not considered. To model these non-categorical errors, this work aims at exploring non-categorical patterns to extend the categorical phone set. We apply a phonetic segment classifier to generate segmental phonetic posterior-grams (SPPGs) to represent phone segment-level information. And then we explore the non-categories by looking for the SPPGs with more than one peak. Compared with the baseline system, this approach explores more non-categorical patterns, and also perceptual experimental results show that the explored non-categories are more accurate with increased confusion degree by 7.3% and 7.5% under two different measures. Finally, we preliminarily analyze the reason behind those non-categories.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Deep segmental phonetic posterior-grams based discovery of non-categories in L2 English speech

Li,

Wu,

Liu

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…To solve this problem, our previous work [18] discovered an Extended Phoneme Set in L2 speech (L2-EPS) by using unsupervised clustering algorithm based on phoneme-based features. It gives a more complete description on pronunciation patterns in L2 speech, some of which are ignored by the native phoneme set.…”

Section: Introductionmentioning

confidence: 99%

“…However, there are still some phonetic patterns that cannot be captured well by that approach. Because the phoneme-based phonemic posterior-grams (PPGs) used as clustering features in [18] cannot provide the state information within a segment, which is also important in reflecting the phonetic patterns.…”

Section: Introductionmentioning

confidence: 99%

Unsupervised Discovery of Non-native Phonetic Patterns in L2 English Speech for Mispronunciation Detection and Diagnosis

Liu

Mao

et al. 2018

Interspeech 2018

Self Cite

View full text Add to dashboard Cite

Second language (L2) speech is often annotated with the native phoneme categories. However, we often observe that an L2 speech segment generally deviates from a canonical phoneme, and sometimes it is very difficult for linguists to annotate with any canonical phoneme label. We refer to these segments as non-native phonetic patterns. Existing approaches to mispronunciation detection and diagnosis (MDD) focus mainly on canonical mispronunciations, i.e. one canonical phoneme is substituted for another, aside from those deleted or inserted. To better represent L2 speech, this work explores non-native phonetic patterns (NN-PPs) of each native phoneme by an unsupervised approach. We apply an optimized k-means algorithm to cluster state-based phonemic posterior-grams, which are generated with a deep neural network. Then, to discover the NN-PPs related to each native phoneme, we perform forced alignment to divide L2 speech into segments grouped by native phonemes. We use the cluster sequences within segments derived from clustering results to represent different phonetic patterns of each native phoneme. Finally, we apply Cluster Sequence Analysis to discover each phoneme's potential NN-PPs. We verified experimentally that NN-PPs can extend the native phoneme categories to better describe L2 speech, which can enrich the existing approaches to MDD for better performance.

show abstract

Analysis of Mispronunciation Detection and Diagnosis Based on Conventional Deep Learning Techniques

Soundarya,

Anusuya

2024

Lecture Notes in Networks and Systems

View full text Add to dashboard Cite

Unsupervised Discovery of an Extended Phoneme Set in L2 English Speech for Mispronunciation Detection and Diagnosis

Cited by 12 publications

References 15 publications

Deep segmental phonetic posterior-grams based discovery of non-categories in L2 English speech

Deep segmental phonetic posterior-grams based discovery of non-categories in L2 English speech

Unsupervised Discovery of Non-native Phonetic Patterns in L2 English Speech for Mispronunciation Detection and Diagnosis

Analysis of Mispronunciation Detection and Diagnosis Based on Conventional Deep Learning Techniques

Contact Info

Product

Resources

About