2021 IEEE Spoken Language Technology Workshop (SLT) 2021
DOI: 10.1109/slt48900.2021.9383574
|View full text |Cite
|
Sign up to set email alerts
|

Articulatory Comparison of L1 and L2 Speech for Mispronunciation Diagnosis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 11 publications
0
3
0
Order By: Relevance
“…The number of augmented sentences was ten times that of the original TORGO dataset. The synthesized speech was used for training the DNN-HMM ASR model, trained on fMLLR transformed features Baseline configuration files provided in the Pytorch-kaldi repository for common speech databases like TIMIT, Librispeech were used as reference and the final architecture was based on experimental results using a small number of training set speakers (Khanal, Johnson et al 2021). Our ASR models includes light bidirectional GRU (Ravanelli, Parcollet et al 2019) architecture, with five layers containing 1024 cells each, activated by Relu activation function and dropout of 0.2.…”
Section: Frame Level Maskingmentioning
confidence: 99%
See 1 more Smart Citation
“…The number of augmented sentences was ten times that of the original TORGO dataset. The synthesized speech was used for training the DNN-HMM ASR model, trained on fMLLR transformed features Baseline configuration files provided in the Pytorch-kaldi repository for common speech databases like TIMIT, Librispeech were used as reference and the final architecture was based on experimental results using a small number of training set speakers (Khanal, Johnson et al 2021). Our ASR models includes light bidirectional GRU (Ravanelli, Parcollet et al 2019) architecture, with five layers containing 1024 cells each, activated by Relu activation function and dropout of 0.2.…”
Section: Frame Level Maskingmentioning
confidence: 99%
“…The architecture applies monophone regularization (Ravanelli, Brakel et al 2018). A multi-task learning procedure was applied using two SoftMax classifiers, one estimating context-dependent states and the second one predicting monophone targets (Khanal, Johnson et al 2021). For testing, a leave-one-speaker-out cross-validation procedure was applied across the original TORGO dataset.…”
Section: Frame Level Maskingmentioning
confidence: 99%
“…EMA [ 39 , 40 , 41 ] is useful to localize the movement within the vocal tract by using electromagnetic transmitter coils to track the position of the attached electromagnetic sensors on the tongue, lips, and jaw. EMA may provide either a 2D or 3D landmark localization in milliseconds, but the system operation is complex and uncomfortable to be used in all cases on a daily basis; it might be more usable for conducting clinical studies at research centres.…”
Section: Introductionmentioning
confidence: 99%