Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-1467
|View full text |Cite
|
Sign up to set email alerts
|

Transformer Based End-to-End Mispronunciation Detection and Diagnosis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 22 publications
(12 citation statements)
references
References 19 publications
0
7
0
Order By: Relevance
“…Goodness-of-pronunciation (GOP) is among the first DNN-based methods to MDD, which relies on phone posterior outputs from an automatic speech recognizer (ASR) [1,2,23] to evaluate phonetic errors. More recently, end-to-end phoneme recognition has been studied [5,3,4,8,24], among which [4] and [24] also explored fine-tuning Wav2vec 2.0. Our proposed method differs with them in that we investigate the usage of unlabeled target domain speech to enhance MDD performance.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Goodness-of-pronunciation (GOP) is among the first DNN-based methods to MDD, which relies on phone posterior outputs from an automatic speech recognizer (ASR) [1,2,23] to evaluate phonetic errors. More recently, end-to-end phoneme recognition has been studied [5,3,4,8,24], among which [4] and [24] also explored fine-tuning Wav2vec 2.0. Our proposed method differs with them in that we investigate the usage of unlabeled target domain speech to enhance MDD performance.…”
Section: Related Workmentioning
confidence: 99%
“…We compute per-speaker PERs between the recognized phonemes and the canonical phonemes given by a grapheme-to-phoneme model. 3 Then we select 10 highest-PER speakers and 10 lowest-PER speakers, and randomly sample 10 utterances for each of the 20 speakers. 17 human listeners score the accentedness (scale: 1-9 where 1 means heavy accent) and intelligibility (scale: 0-100 where 0 means not intelligible at all) of the sampled utterances.…”
Section: Open Test: Indian Accent and Intelligibility Assessmentmentioning
confidence: 99%
See 2 more Smart Citations
“…The computer-assisted pronunciation training (CAPT) system, which can conduct assessments and provide detailed feedback on pronunciation proficiency, is thus attracting attention as an ESL learning service and platform [1,2]. There are two technical approaches to the CAPT system: mispronunciation detection and diagnosis (MDD) [3,4,5,6,7,8,9,10,11,12] and automatic pronunciation assessment [6,13,14,15,16,17,10]. MDD is a task of detecting pronunciation errors by calculating multiple measures using estimated and canonical phones from an automatic speech recognizer.…”
Section: Introductionmentioning
confidence: 99%