A Lightly Supervised Approach to Detect Stuttering in Children's Speech

Alharbi, Sadeen; Hasan, Madina; Simons, Anthony J. H.; Brumfitt, Shelagh; Green, Phil

doi:10.21437/interspeech.2018-2155

Cited by 34 publications

(39 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As a result, while these methods struggle with sub-word stutters such as sound repetition or revision, they perform well for word repetition or prolongation. This can be observed in Table 3 as [11] performs better than our method by a small margin (3.2%) for word repetition. Additionally, [11] performs with a lower miss rate than ours for detection of prolongation (5.92%).…”

Section: Performance and Comparisonmentioning

confidence: 62%

“…Another model using Bi-LSTMs with condition random fields (CRFs) to get an average F-score of 85.9% across all stutter types [15]. The current state-of-the-art stutter classification method uses task-oriented finite state transducer (FST) lattices to detect repetition stutters with an average 37% miss rate across 4 different types of [11].…”

Section: Related Workmentioning

confidence: 99%

“…The results of our experiments for the UCLASS dataset is summarized in Table 3, where we compare our method to [11]. Additionally, to evaluate the need for bidirectional LSTM as opposed to a unidirectional LSTM, we compare our results to a baseline model where a ResNet with LSTM is used instead of our proposed model.…”

Section: Performance and Comparisonmentioning

confidence: 99%

“…This is partially due to the fact that the notion of detecting and classifying the type and location of stutters can be a difficult problem, especially when factoring in variables such as gender, speech rate, accent, and phone-realization [8]. Existing works in the area mostly rely on automatic speech recognition (ASR) to first convert audio signals to text, and then utilize language models to detect and identify the stutters [9] [10] [11]. While this approach has proven effective and achieved promising results, the reliance on ASR can both be a potential source for error, as well as an unnecessary additional computational step.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Detecting Multiple Speech Disfluencies Using a Deep Residual Network with Bidirectional Long Short-Term Memory

Kourkounakis

Hajavi

Etemad

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Stuttering is a speech impediment affecting tens of millions of people on an everyday basis. Even with its commonality, there is minimal data and research on the identification and classification of stuttered speech. This paper tackles the problem of detection and classification of different forms of stutter. As opposed to most existing works that identify stutters with language models, our work proposes a model that relies solely on acoustic features, allowing for identification of several variations of stutter disfluencies without the need for speech recognition. Our model uses a deep residual network and bidirectional long short-term memory layers to classify different types of stutters and achieves an average miss rate of 10.03%, outperforming the state-of-the-art by almost 27%.

show abstract

Section: Performance and Comparisonmentioning

confidence: 62%

Section: Related Workmentioning

confidence: 99%

Section: Performance and Comparisonmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Detecting Multiple Speech Disfluencies Using a Deep Residual Network with Bidirectional Long Short-Term Memory

Kourkounakis

Hajavi

Etemad

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…The focus of this paper is on detection of five stuttering event types: Blocks, Prolongations, Sound Repetitions, Word/Phrase Repetitions, and Interjections. Existing work has explored this problem using traditional signal processing techniques [15,16,17], language modeling (LM) [12,18,19,20,21], and acoustic modeling (AM) [21,10]. Each approach has be shown to be effective 1.…”

Section: Introductionmentioning

confidence: 99%

SEP-28k: A Dataset for Stuttering Event Detection from Podcasts with People Who Stutter

Mitra

Joshi

Kajarekar

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

The ability to automatically detect stuttering events in speech could help speech pathologists track an individual's fluency over time or help improve speech recognition systems for people with atypical speech patterns. Despite increasing interest in this area, existing public datasets are too small to build generalizable dysfluency detection systems and lack sufficient annotations. In this work, we introduce Stuttering Events in Podcasts (SEP-28k), a dataset containing over 28k clips labeled with five event types including blocks, prolongations, sound repetitions, word repetitions, and interjections. Audio comes from public podcasts largely consisting of people who stutter interviewing other people who stutter. We benchmark a set of acoustic models on SEP-28k and the public FluencyBank dataset and highlight how simply increasing the amount of training data improves relative detection performance by 28% and 24% F1 on each. Annotations from over 32k clips across both datasets will be publicly released.

show abstract