Automatic Scoring at Multi-Granularity for L2 Pronunciation

Lin, Baiquan; Wang, Liyuan; Feng, Xiao; Zhang, Jinsong

doi:10.21437/interspeech.2020-1282

Cited by 17 publications

(13 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…All model and feature configurations are compared with PCC and MSE metrics following setup in previous work [15,21].…”

Section: Resultsmentioning

confidence: 99%

“…For Malay and Tamil, the average rating scores were used as ground truth scores. For each corpus, multiple inter-rater PCC were calculated between the scores of one rater and the average scores of the rest of all raters [21]. By averaging all inter-rater PCC, the upper bound of the scoring performance (Human performance) was obtained (see the bottom lines in Table 2-3).…”

Section: Speech Corporamentioning

confidence: 99%

“…Long short-term memory recurrent network (LSTM) was adopted in pronunciation assessment [19,20]. More recently, attention mechanism has also been applied [21,15,12] to speech evaluation. These studies have presented promising improvement on speech evaluation performance in the language specific tasks.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Multilingual Speech Evaluation: Case Studies on English, Malay and Tamil

Zhang¹,

Shi

Chen

2021

Preprint

View full text Add to dashboard Cite

Speech evaluation is an essential component in computerassisted language learning (CALL). While speech evaluation on English has been popular, automatic speech scoring on low resource languages remains challenging. Work in this area has focused on monolingual specific designs and handcrafted features stemming from resource-rich languages like English. Such approaches are often difficult to generalize to other languages, especially if we also want to consider suprasegmental qualities such as rhythm. In this work, we examine three different languages that possess distinct rhythm patterns: English (stresstimed), Malay (syllable-timed), and Tamil (mora-timed). We exploit robust feature representations inspired by music processing and vector representation learning. Empirical validations show consistent gains for all three languages when predicting pronunciation, rhythm and intonation performance.

show abstract

“…All model and feature configurations are compared with PCC and MSE metrics following setup in previous work [15,21].…”

Section: Resultsmentioning

confidence: 99%

Section: Speech Corporamentioning

confidence: 99%

See 1 more Smart Citation

Multilingual Speech Evaluation: Case Studies on English, Malay and Tamil

Zhang¹,

Shi

Chen

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…It is well-known that the L2 learning process is heavily affected by a well-established habitual perception of phonemes and articulatory motions in the learners' primary language (L1) [1], which often cause mistakes and imprecise articulation in speech productions by the L2 learners, e.g., a negative language transfer [1,2]. As a feasible tool, computer assisted pronunciation training (CAPT) is often employed to automatically assess L2 learners' pronunciation quality at different levels, e.g., phonelevel [3][4][5][6][7][8][9][10][11][12], word-level [13][14][15][16][17] and sentence-level [18][19][20][21][22].…”

Section: Introductionmentioning

confidence: 99%

“…Alternatively, the two-step approaches treat pronunciation scoring or mispronunciation detection as regression or classification task. Specifically, phone, word and sentence boundaries are first generated by forced-alignment, and then either frame-level or segmental-level pronunciation features within each boundary are fed into task-dependent classifiers or regressors (e.g., [6,8,12,[15][16][17][18][19][20][21][22]). Finally, the posterior probabilities or predicted values obtained from those models are often used as pronunciation scores.…”

Section: Introductionmentioning

confidence: 99%

Improving Non-native Word-level Pronunciation Scoring with Phone-level Mixup Data Augmentation and Multi-source Information

Fu¹,

Gao²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Deep learning-based pronunciation scoring models highly rely on the availability of the annotated non-native data, which is costly and has scalability issues. To deal with the data scarcity problem, data augmentation is commonly used for model pretraining. In this paper, we propose a phone-level mixup, a simple yet effective data augmentation method, to improve the performance of word-level pronunciation scoring. Specifically, given a phoneme sequence from lexicon, the artificial augmented word sample can be generated by randomly sampling from the corresponding phone-level features in training data, while the word score is the average of their GOP scores. Benefit from the arbitrary phone-level combination, the mixup is able to generate any word with various pronunciation scores. Moreover, we utilize multi-source information (e.g., MFCC and deep features) to further improve the scoring system performance. The experiments conducted on the Speechocean762 show that the proposed system outperforms the baseline by adding the mixup data for pretraining, with Pearson correlation coefficients (PCC) increasing from 0.567 to 0.61. The results also indicate that proposed method achieves similar performance by using 1/10 unlabeled data of baseline. In addition, the experimental results also demonstrate the efficiency of our proposed multisource approach.

show abstract

Neural-based automatic scoring model for Chinese-English interpretation with a multi-indicator assessment

Chen

et al. 2022

Connection Science

View full text Add to dashboard Cite

Automatic Scoring at Multi-Granularity for L2 Pronunciation

Cited by 17 publications

References 7 publications

Multilingual Speech Evaluation: Case Studies on English, Malay and Tamil

Multilingual Speech Evaluation: Case Studies on English, Malay and Tamil

Improving Non-native Word-level Pronunciation Scoring with Phone-level Mixup Data Augmentation and Multi-source Information

Neural-based automatic scoring model for Chinese-English interpretation with a multi-indicator assessment

Contact Info

Product

Resources

About