Child Speech Disorder Detection with Siamese Recurrent Network Using Speech Attribute Features

Wang, Jiarui; Qin, Ying; Peng, Zhiyuan; Lee, Tan

doi:10.21437/interspeech.2019-2320

Cited by 20 publications

(21 citation statements)

References 13 publications

(12 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Mauro et al incorporate the speech of a reference speaker to detect mispronunciations at the phoneme level [11]. Wang et al use siamese networks for modeling discrepancy between normal and distorted children's speech [12]. We take a similar approach but we do not need a database of reference speech.…”

Section: Related Workmentioning

confidence: 99%

Mispronunciation Detection in Non-Native (L2) English with Uncertainty Modeling

Korzekwa

Lorenzo-Trueba

Zaporowski

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

A common approach to the automatic detection of mispronunciation in language learning is to recognize the phonemes produced by a student and compare it to the expected pronunciation of a native speaker. This approach makes two simplifying assumptions: a) phonemes can be recognized from speech with high accuracy, b) there is a single correct way for a sentence to be pronounced. These assumptions do not always hold, which can result in a significant amount of false mispronunciation alarms. We propose a novel approach to overcome this problem based on two principles: a) taking into account uncertainty in the automatic phoneme recognition step, b) accounting for the fact that there may be multiple valid pronunciations. We evaluate the model on non-native (L2) English speech of German, Italian and Polish speakers, where it is shown to increase the precision of detecting mispronunciations by up to 18% (relative) compared to the common approach.

show abstract

Section: Related Workmentioning

confidence: 99%

Mispronunciation Detection in Non-Native (L2) English with Uncertainty Modeling

Korzekwa

Lorenzo-Trueba

Zaporowski

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…The absolute difference of the two LSTM networks' output is calculated and then fed as input to the fully connected layer with the two outputs as illustrated in figure 3. This architecture is similar with the one used in [19]. Henceforth, the first model will be referred to as MaLSTM, while the second model is called Siamese-Classifier.…”

Section: = σ(mentioning

confidence: 99%

Al-Quran recitation verification for memorization test using Siamese LSTM network

Rajagede

Hastuti²

2021

CST

View full text Add to dashboard Cite

In the process of verifying Al-Quran memorization, a person is usually asked to recite a verse without looking at the text. This process is generally done together with a partner to verify the reading. This paper proposes a model using Siamese LSTM Network to help users check their Al-Quran memorization alone. Siamese LSTM network will verify the recitation by matching the input with existing data for a read verse. This study evaluates two Siamese LSTM architectures, the Manhattan LSTM and the Siamese-Classifier. The Manhattan LSTM outputs a single numerical value that represents the similarity, while the Siamese-Classifier uses a binary classification approach. In this study, we compare Mel-Frequency Cepstral Coefficient (MFCC), Mel-Frequency Spectral Coefficient (MFSC), and delta features against model performance. We use the public dataset from Every Ayah website and provide the usage information for future comparison. Our best model, using MFCC with delta and Manhattan LSTM, produces an F1-score of 77.35%

show abstract

“…GRU is a simplified architecture with an efficiency degree that is comparable to LSTM. These two approaches have been adopted for building automatic speech assessment systems [10,[16][17][18][19], e.g., the work done by Korzekwa et al on dysarthric speech [16].…”

Section: Automatic Assessment Approachesmentioning

confidence: 99%

“…Mel-frequency cepstral coefficients (MFCCs) are commonly used in speech assessment systems for acoustic modeling [10,[17][18][19]24] and feature extraction [25,26]. While deep learning models recently attract intense attentions, Mel Spectrogram is also getting increasingly popular [10,12,16].…”

Section: Speech Representationmentioning

confidence: 99%

See 1 more Smart Citation

Classifying Speech Intelligibility Levels of Children in Two Continuous Speech Styles

Lin

Tseng

2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Speech difficulties of children may result from pathological problems. Oral language is normally assessed by expertdirected impressionistic judgments on varying speech types. This paper attempts to construct automatic systems that help detect children with severe speech problems at an early stage. Two continuous speech types, repetitive and storytelling speech, produced by Chinese-speaking hearing and hearing-impaired children are applied to Long Short-Term Memory (LSTM) and Universal Transformer (UT) models. Three approaches to extracting acoustic features are adopted: MFCCs, Mel Spectrogram, and acoustic-phonetic features. Results of leave-one-out cross-validation and models trained by augmented data show that MFCCs are more useful than Mel Spectrogram and acoustic-phonetic features. Respective LSTM and UT models have their own advantages in different settings. Eventually, our model trained on repetitive speech is able to achieve an F1-score of 0.74 for testing on storytelling speech.

show abstract

Child Speech Disorder Detection with Siamese Recurrent Network Using Speech Attribute Features

Cited by 20 publications

References 13 publications

Mispronunciation Detection in Non-Native (L2) English with Uncertainty Modeling

Mispronunciation Detection in Non-Native (L2) English with Uncertainty Modeling

Al-Quran recitation verification for memorization test using Siamese LSTM network

Classifying Speech Intelligibility Levels of Children in Two Continuous Speech Styles

Contact Info

Product

Resources

About