The ACM Multimedia 2022 Computational Paralinguistics Challenge

Schuller, Björn W.; Batliner, Anton; Amiriparian, Shahin; Bergler, Christian; Gerczuk, Maurice; Holz, Natalie; Larrouy-Maestri, Pauline; Bayerl, Sebastien; Riedhammer, Korbinian; Mallol-Ragolta, Adria; Pateraki, Maria; Coppock, Harry; Kiskin, Ivan; Sinka, Marianne; Roberts, Stephen

doi:10.1145/3503161.3551591

Cited by 27 publications

(14 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Comparing the current models that used 3-s KSoF data with Schuller et al (2022) showed that our models performed less well. Schuller et al (2022) reported only unweighted average recall (UAR), achieving a 37.6 UAR in test using a set of one hundred principal components from a 6,373-feature set. In comparison, using a feature set of 1,136 on the KSoF intervals, the G-SVM yielded a UAR 25.47.…”

Section: Discussionmentioning

confidence: 84%

“…This does not invalidate the conclusion that event-based approaches lead to better machine learning models since the UAR of the UCLASS event-based G-SVM (UAR = 40.66) outperformed Schuller’s reference. Rather, models can be further improved by: (a) Supplying a richer feature set as demonstrated by Schuller et al (2022) ; and (b) Using event-based segmentation methods.…”

Section: Discussionmentioning

confidence: 99%

“…The KSoF dataset contains 4,601 3-s intervals of speech of which 2,907 had valid singular labels for fluent speech, prolongation, part-word repetition (PWR), whole word repetition (WWR), and blocks. Here, the data were split into training ( N = 1,545), validation ( N = 662), and test ( N = 700) folds which was the split that Schuller et al (2022) used. KSoF also has filler ( N = 390), modified speech ( N = 1,203), and garbage intervals ( N = 101).…”

Section: Methodsmentioning

confidence: 99%

See 2 more Smart Citations

Comparison of performance of automatic recognizers for stutters in speech trained with event or interval markers

Barrett,

Tang,

Howell

2024

Front. Psychol.

View full text Add to dashboard Cite

IntroductionAutomatic recognition of stutters (ARS) from speech recordings can facilitate objective assessment and intervention for people who stutter. However, the performance of ARS systems may depend on how the speech data are segmented and labelled for training and testing. This study compared two segmentation methods: event-based, which delimits speech segments by their fluency status, and interval-based, which uses fixed-length segments regardless of fluency.MethodsMachine learning models were trained and evaluated on interval-based and event-based stuttered speech corpora. The models used acoustic and linguistic features extracted from the speech signal and the transcriptions generated by a state-of-the-art automatic speech recognition system.ResultsThe results showed that event-based segmentation led to better ARS performance than interval-based segmentation, as measured by the area under the curve (AUC) of the receiver operating characteristic. The results suggest differences in the quality and quantity of the data because of segmentation method. The inclusion of linguistic features improved the detection of whole-word repetitions, but not other types of stutters.DiscussionThe findings suggest that event-based segmentation is more suitable for ARS than interval-based segmentation, as it preserves the exact boundaries and types of stutters. The linguistic features provide useful information for separating supra-lexical disfluencies from fluent speech but may not capture the acoustic characteristics of stutters. Future work should explore more robust and diverse features, as well as larger and more representative datasets, for developing effective ARS systems.

show abstract

Section: Discussionmentioning

confidence: 84%

Section: Discussionmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Comparison of performance of automatic recognizers for stutters in speech trained with event or interval markers

Barrett,

Tang,

Howell

2024

Front. Psychol.

View full text Add to dashboard Cite

show abstract

“…Collecting a balanced dataset is difficult and expensive for the stuttering detection task. Other datasets such as Kassel State of Fluency (KSoF) (not publicly accessible) [36], Flu-encyBank [28] also suffer from this issue. Over the years, the class imbalance problem is one of the main concerns due to its prevalence, especially in the biomedical domain.…”

Section: A Class Imbalancementioning

confidence: 99%

“…To evaluate the model performance, we use the following metrics: macro F1-score and accuracy which are the standard and are widely used in the stuttered speech domain [27], [28], [31], [36], [67]. The macro F1-score (F 1 ) (which combines the advantages of both precision and recall in a single metric unlike unweighted average recall which only takes recall into account) from equation ( 3) is often used in class imbalance scenarios with the intention to give equal importance to frequent and infrequent classes, and also is more robust towards the error type distribution [68].…”

Section: Evaluation Metricsmentioning

confidence: 99%

Advancing Stuttering Detection via Data Augmentation, Class-Balanced Loss and Multi-Contextual Deep Learning

Sheikh

Sahidullah

Hirsch³

et al. 2023

IEEE J. Biomed. Health Inform.

View full text Add to dashboard Cite

Stuttering is a neuro-developmental speech impairment characterized by uncontrolled utterances (interjections) and core behaviors (blocks, repetitions, and prolongations), and is caused by the failure of speech sensorimotors. Due to its complex nature, stuttering detection (SD) is a difficult task. If detected at an early stage, it could facilitate speech therapists to observe and rectify the speech patterns of persons who stutter (PWS). The stuttered speech of PWS is usually available in limited amounts and is highly imbalanced. To this end, we address the class imbalance problem in the SD domain via a multibranching (MB) scheme and by weighting the contribution of classes in the overall loss function, resulting in a huge improvement in stuttering classes on the SEP-28k dataset over the baseline (StutterNet). To tackle data scarcity, we investigate the effectiveness of data augmentation on top of a multi-branched training scheme. The augmented training outperforms the MB StutterNet (clean) by a relative margin of 4.18% in macro F1-score (F 1 ). In addition, we propose a multi-contextual (MC) StutterNet, which exploits different contexts of the stuttered speech, resulting in an overall improvement of 4.48% in F 1 over the single context based MB StutterNet. Finally, we have shown that applying data augmentation in the cross-corpora scenario can improve the overall SD performance by a relative margin of 13.23% in F 1 over the clean training.

show abstract

DCRNNX: Dual-Channel Recurrent Neural Network with Xgboost for Emotion Identification Using Nonspeech Vocalizations

Liang

Zou

Xie

et al. 2022

Artificial Intelligence and Mobile Services – AIMS 2022

View full text Add to dashboard Cite

The ACM Multimedia 2022 Computational Paralinguistics Challenge

Cited by 27 publications

References 42 publications

Comparison of performance of automatic recognizers for stutters in speech trained with event or interval markers

Comparison of performance of automatic recognizers for stutters in speech trained with event or interval markers

Advancing Stuttering Detection via Data Augmentation, Class-Balanced Loss and Multi-Contextual Deep Learning

DCRNNX: Dual-Channel Recurrent Neural Network with Xgboost for Emotion Identification Using Nonspeech Vocalizations

Contact Info

Product

Resources

About