Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-1282
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Scoring at Multi-Granularity for L2 Pronunciation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
13
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 17 publications
(13 citation statements)
references
References 7 publications
0
13
0
Order By: Relevance
“…All model and feature configurations are compared with PCC and MSE metrics following setup in previous work [15,21].…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…All model and feature configurations are compared with PCC and MSE metrics following setup in previous work [15,21].…”
Section: Resultsmentioning
confidence: 99%
“…For Malay and Tamil, the average rating scores were used as ground truth scores. For each corpus, multiple inter-rater PCC were calculated between the scores of one rater and the average scores of the rest of all raters [21]. By averaging all inter-rater PCC, the upper bound of the scoring performance (Human performance) was obtained (see the bottom lines in Table 2-3).…”
Section: Speech Corporamentioning
confidence: 99%
See 1 more Smart Citation
“…It is well-known that the L2 learning process is heavily affected by a well-established habitual perception of phonemes and articulatory motions in the learners' primary language (L1) [1], which often cause mistakes and imprecise articulation in speech productions by the L2 learners, e.g., a negative language transfer [1,2]. As a feasible tool, computer assisted pronunciation training (CAPT) is often employed to automatically assess L2 learners' pronunciation quality at different levels, e.g., phonelevel [3][4][5][6][7][8][9][10][11][12], word-level [13][14][15][16][17] and sentence-level [18][19][20][21][22].…”
Section: Introductionmentioning
confidence: 99%
“…Alternatively, the two-step approaches treat pronunciation scoring or mispronunciation detection as regression or classification task. Specifically, phone, word and sentence boundaries are first generated by forced-alignment, and then either frame-level or segmental-level pronunciation features within each boundary are fed into task-dependent classifiers or regressors (e.g., [6,8,12,[15][16][17][18][19][20][21][22]). Finally, the posterior probabilities or predicted values obtained from those models are often used as pronunciation scores.…”
Section: Introductionmentioning
confidence: 99%