End-to-End and Self-Supervised Learning for ComParE 2022 Stuttering Sub-Challenge

Sheikh, Shakeel A.; Sahidullah, Md; Ouni, Slim; Hirsch, Fabrice

doi:10.1145/3503161.3551588

Cited by 3 publications

(3 citation statements)

References 20 publications

(18 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The fine-tuned stuttering classification system, using the stuttering data anonymized with either the baseline or the improved model, outperforms the baseline system by a substantial margin [9] and even outperforms some submissions to the challenge [11,30,10]. The overall results include the recall value for the garbage class in the UAR, which somewhat diminishes the results of keeping the dysfluency information.…”

Section: Resultsmentioning

confidence: 97%

See 1 more Smart Citation

Anonymization of Stuttered Speech -- Removing Speaker Information while Preserving the Utterance

Hintz,

Bayerl,

Sinha

et al. 2023

3rd Symposium on Security and Privacy in Speech Communication

View full text Add to dashboard Cite

Concealing the identity through speaker anonymization is essential in various situations. This study focuses on investigating how stuttering affects the anonymization process. Two scenarios are considered: preserving the pathology in the diagnostic/remote treatment context and obfuscating the pathology. The paper examines the effectiveness of three state-of-theart approaches in achieving high anonymization, as well as the preservation of dysfluencies. The findings indicate that while a speaker conversion method may not achieve perfect anonymization (Baseline 27.25% EER and F0 Delta 32.63% EER), it does preserve the pathology. This effect was objectively evaluated by performing a stuttering classification. Although this solution may be useful in a remote treatment scenario for speech pathologies, it presents a vulnerability in anonymization. To address this issue, we propose an alternative approach that uses automatic speech recognition and text-based speech synthesis to avoid re-identification (48.27% EER).

show abstract

Section: Resultsmentioning

confidence: 97%

“…Most contributions used W2V2-based systems. [10] and [11] used pre-trained W2V2 models as feature extractors. Grósz et al used fine-tuning of different W2V2 models for stuttering classification, yielding an unweighted average recall of up to 62.1.% on the eight-class problem [12].…”

Section: Introductionmentioning

confidence: 99%

Anonymization of Stuttered Speech -- Removing Speaker Information while Preserving the Utterance

Hintz,

Bayerl,

Sinha

et al. 2023

3rd Symposium on Security and Privacy in Speech Communication

View full text Add to dashboard Cite

show abstract

“…To evaluate the model performance, we use the following metrics: macro F1-score and accuracy which are the standard and are widely used in the stuttered speech domain [27], [28], [31], [36], [67]. The macro F1-score (F 1 ) (which combines the advantages of both precision and recall in a single metric unlike unweighted average recall which only takes recall into account) from equation ( 3) is often used in class imbalance scenarios with the intention to give equal importance to frequent and infrequent classes, and also is more robust towards the error type distribution [68].…”

Section: Evaluation Metricsmentioning

confidence: 99%

Advancing Stuttering Detection via Data Augmentation, Class-Balanced Loss and Multi-Contextual Deep Learning

Sheikh

Sahidullah

Hirsch³

et al. 2023

IEEE J. Biomed. Health Inform.

View full text Add to dashboard Cite

Stuttering is a neuro-developmental speech impairment characterized by uncontrolled utterances (interjections) and core behaviors (blocks, repetitions, and prolongations), and is caused by the failure of speech sensorimotors. Due to its complex nature, stuttering detection (SD) is a difficult task. If detected at an early stage, it could facilitate speech therapists to observe and rectify the speech patterns of persons who stutter (PWS). The stuttered speech of PWS is usually available in limited amounts and is highly imbalanced. To this end, we address the class imbalance problem in the SD domain via a multibranching (MB) scheme and by weighting the contribution of classes in the overall loss function, resulting in a huge improvement in stuttering classes on the SEP-28k dataset over the baseline (StutterNet). To tackle data scarcity, we investigate the effectiveness of data augmentation on top of a multi-branched training scheme. The augmented training outperforms the MB StutterNet (clean) by a relative margin of 4.18% in macro F1-score (F 1 ). In addition, we propose a multi-contextual (MC) StutterNet, which exploits different contexts of the stuttered speech, resulting in an overall improvement of 4.48% in F 1 over the single context based MB StutterNet. Finally, we have shown that applying data augmentation in the cross-corpora scenario can improve the overall SD performance by a relative margin of 13.23% in F 1 over the clean training.

show abstract

Stuttering detection using speaker representations and self-supervised contextual embeddings

Sheikh

Sahidullah²,

Hirsch

et al. 2023

Int J Speech Technol

Self Cite

View full text Add to dashboard Cite

The adoption of advanced deep learning architectures in stuttering detection (SD) tasks is challenging due to the limited size of the available datasets. To this end, this work introduces the application of speech embeddings extracted from pre-trained deep learning models trained on large audio datasets for different tasks. In particular, we explore audio representations obtained using emphasized channel attention, propagation, and aggregation time delay neural network (ECAPA-TDNN) and Wav2Vec2.0 models trained on VoxCeleb and LibriSpeech datasets respectively. After extracting the embeddings, we benchmark with several traditional classifiers, such as the K-nearest neighbour (KNN), Gaussian naive Bayes, and neural network, for the SD tasks. In comparison to the standard SD systems trained only on the limited SEP-28k dataset, we obtain a relative improvement of 12.08%, 28.71%, 37.9% in terms of unweighted average recall (UAR) over the baselines. Finally, we have shown that combining two embeddings and concatenating multiple layers of Wav2Vec2.0 can further improve the UAR by up to 2.60% and 6.32% respectively.

show abstract

End-to-End and Self-Supervised Learning for ComParE 2022 Stuttering Sub-Challenge

Cited by 3 publications

References 20 publications

Anonymization of Stuttered Speech -- Removing Speaker Information while Preserving the Utterance

Anonymization of Stuttered Speech -- Removing Speaker Information while Preserving the Utterance

Advancing Stuttering Detection via Data Augmentation, Class-Balanced Loss and Multi-Contextual Deep Learning

Stuttering detection using speaker representations and self-supervised contextual embeddings

Contact Info

Product

Resources

About