Proceedings of the 30th ACM International Conference on Multimedia 2022
DOI: 10.1145/3503161.3551588
|View full text |Cite
|
Sign up to set email alerts
|

End-to-End and Self-Supervised Learning for ComParE 2022 Stuttering Sub-Challenge

Abstract: In this paper, we present end-to-end and speech embedding based systems trained in a self-supervised fashion to participate in the ACM Multimedia 2022 ComParE Challenge, specifically the stuttering sub-challenge. In particular, we exploit the embeddings from the pre-trained Wav2Vec2.0 model for stuttering detection (SD) on the KSoF dataset. After embedding extraction, we benchmark with several methods for SD. Our proposed self-supervised based SD system achieves a UAR of 36.9% and 41.0% on validation and test … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 20 publications
(18 reference statements)
0
3
0
Order By: Relevance
“…The fine-tuned stuttering classification system, using the stuttering data anonymized with either the baseline or the improved model, outperforms the baseline system by a substantial margin [9] and even outperforms some submissions to the challenge [11,30,10]. The overall results include the recall value for the garbage class in the UAR, which somewhat diminishes the results of keeping the dysfluency information.…”
Section: Resultsmentioning
confidence: 97%
See 1 more Smart Citation
“…The fine-tuned stuttering classification system, using the stuttering data anonymized with either the baseline or the improved model, outperforms the baseline system by a substantial margin [9] and even outperforms some submissions to the challenge [11,30,10]. The overall results include the recall value for the garbage class in the UAR, which somewhat diminishes the results of keeping the dysfluency information.…”
Section: Resultsmentioning
confidence: 97%
“…Most contributions used W2V2-based systems. [10] and [11] used pre-trained W2V2 models as feature extractors. Grósz et al used fine-tuning of different W2V2 models for stuttering classification, yielding an unweighted average recall of up to 62.1.% on the eight-class problem [12].…”
Section: Introductionmentioning
confidence: 99%
“…To evaluate the model performance, we use the following metrics: macro F1-score and accuracy which are the standard and are widely used in the stuttered speech domain [27], [28], [31], [36], [67]. The macro F1-score (F 1 ) (which combines the advantages of both precision and recall in a single metric unlike unweighted average recall which only takes recall into account) from equation ( 3) is often used in class imbalance scenarios with the intention to give equal importance to frequent and infrequent classes, and also is more robust towards the error type distribution [68].…”
Section: Evaluation Metricsmentioning
confidence: 99%