Articulatory Comparison of L1 and L2 Speech for Mispronunciation Diagnosis

Khanal, Subash; Johnson, Michael T.; Bozorg, Narjes

doi:10.1109/slt48900.2021.9383574

Cited by 4 publications

(3 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The number of augmented sentences was ten times that of the original TORGO dataset. The synthesized speech was used for training the DNN-HMM ASR model, trained on fMLLR transformed features Baseline configuration files provided in the Pytorch-kaldi repository for common speech databases like TIMIT, Librispeech were used as reference and the final architecture was based on experimental results using a small number of training set speakers (Khanal, Johnson et al 2021). Our ASR models includes light bidirectional GRU (Ravanelli, Parcollet et al 2019) architecture, with five layers containing 1024 cells each, activated by Relu activation function and dropout of 0.2.…”

Section: Frame Level Maskingmentioning

confidence: 99%

“…The architecture applies monophone regularization (Ravanelli, Brakel et al 2018). A multi-task learning procedure was applied using two SoftMax classifiers, one estimating context-dependent states and the second one predicting monophone targets (Khanal, Johnson et al 2021). For testing, a leave-one-speaker-out cross-validation procedure was applied across the original TORGO dataset.…”

Section: Frame Level Maskingmentioning

confidence: 99%

See 1 more Smart Citation

Accurate Synthesis of Dysarthric Speech for Asr Data Augmentation

Soleymanpour,

Johnson,

Soleymanpour

et al. 2023

Preprint

View full text Add to dashboard Cite

Section: Frame Level Maskingmentioning

confidence: 99%

Section: Frame Level Maskingmentioning

confidence: 99%

Accurate Synthesis of Dysarthric Speech for Asr Data Augmentation

Soleymanpour,

Johnson,

Soleymanpour

et al. 2023

Preprint

View full text Add to dashboard Cite

“…EMA [ 39 , 40 , 41 ] is useful to localize the movement within the vocal tract by using electromagnetic transmitter coils to track the position of the attached electromagnetic sensors on the tongue, lips, and jaw. EMA may provide either a 2D or 3D landmark localization in milliseconds, but the system operation is complex and uncomfortable to be used in all cases on a daily basis; it might be more usable for conducting clinical studies at research centres.…”

Section: Introductionmentioning

confidence: 99%

Tongue Contour Tracking and Segmentation in Lingual Ultrasound for Speech Recognition: A Review

et al. 2022

View full text Add to dashboard Cite

Lingual ultrasound imaging is essential in linguistic research and speech recognition. It has been used widely in different applications as visual feedback to enhance language learning for non-native speakers, study speech-related disorders and remediation, articulation research and analysis, swallowing study, tongue 3D modelling, and silent speech interface. This article provides a comparative analysis and review based on quantitative and qualitative criteria of the two main streams of tongue contour segmentation from ultrasound images. The first stream utilizes traditional computer vision and image processing algorithms for tongue segmentation. The second stream uses machine and deep learning algorithms for tongue segmentation. The results show that tongue tracking using machine learning-based techniques is superior to traditional techniques, considering the performance and algorithm generalization ability. Meanwhile, traditional techniques are helpful for implementing interactive image segmentation to extract valuable features during training and postprocessing. We recommend using a hybrid approach to combine machine learning and traditional techniques to implement a real-time tongue segmentation tool.

show abstract