Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-1186
|View full text |Cite
|
Sign up to set email alerts
|

Improving Accent Identification and Accented Speech Recognition Under a Framework of Self-Supervised Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
9

Relationship

2
7

Authors

Journals

citations
Cited by 17 publications
(5 citation statements)
references
References 0 publications
0
5
0
Order By: Relevance
“…Recently, a self-supervised training method, wav2vec2.0 [9], has achieved promising results on CTC models, and the pre-trained model is shown to accelerate the convergence during the fine-tuning stage. However, even with the pre-trained model obtained by wav2vec2.0, the CTC model needs an external language model (LM) to relax its conditional independence assumption [9,10]. Several works have investigated incorporating BERT into a NAR ASR model to achieve better recognition accuracies [11][12][13].…”
Section: Introductionmentioning
confidence: 99%
“…Recently, a self-supervised training method, wav2vec2.0 [9], has achieved promising results on CTC models, and the pre-trained model is shown to accelerate the convergence during the fine-tuning stage. However, even with the pre-trained model obtained by wav2vec2.0, the CTC model needs an external language model (LM) to relax its conditional independence assumption [9,10]. Several works have investigated incorporating BERT into a NAR ASR model to achieve better recognition accuracies [11][12][13].…”
Section: Introductionmentioning
confidence: 99%
“…However, multi-dialect ASR is an attractive solution in scenarios where sufficient dialect-specific data or information is not available. Therefore, Liu and Fung (2006); Rao and Sak (2017); Jain et al (2018); Yang et al (2018); Fukuda et al (2018); Jain et al (2019); Viglino et al (2019); ; Deng et al (2021) attempt to improve multi-dialect ASR systems. Liu and Fung (2006) use auxiliary accent trees to model Chinese accent variation.…”
Section: Introductionmentioning
confidence: 99%
“…However, multi-dialect ASR is an attractive solution in scenarios where sufficient dialect-specific data or information is not available. Therefore, Liu and Fung (2006); Rao and Sak (2017); Jain et al (2018); Yang et al (2018); Fukuda et al (2018); ; Viglino et al (2019); ; Deng et al (2021) attempt to improve multi-dialect ASR systems. Liu and Fung (2006) use auxiliary accent trees to model Chinese accent variation.…”
Section: Introductionmentioning
confidence: 99%
“…propose a Transformer-based encoder to simultaneously detect the dialect and transcribe an audio sample. More recently, with increased interest in self-supervised learning, Deng et al (2021) explored self-supervised learning techniques to predict the accent from speech and use the predicted information to train an accentspecific self-supervised ASR. They report that such a model significantly outperforms an accentindependent ASR system.…”
Section: Introductionmentioning
confidence: 99%