Proceedings of the 30th ACM International Conference on Multimedia 2022
DOI: 10.1145/3503161.3551572
|View full text |Cite
|
Sign up to set email alerts
|

Wav2vec2-based Paralinguistic Systems to Recognise Vocalised Emotions and Stuttering

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
5

Relationship

1
9

Authors

Journals

citations
Cited by 16 publications
(9 citation statements)
references
References 12 publications
0
4
0
Order By: Relevance
“…In [48], several data augmentation techniques such as noise, time stretching, pitch shifting, time shift, masking, etc., were analysed in Dementia detection. There are some studies on data augmentation targeting text based stuttering detection [13], however, in the case of audio based stuttering/disfluency detection, this has not been studied and analysed deeply [49] .…”
Section: B Data Augmentationmentioning
confidence: 99%
“…In [48], several data augmentation techniques such as noise, time stretching, pitch shifting, time shift, masking, etc., were analysed in Dementia detection. There are some studies on data augmentation targeting text based stuttering detection [13], however, in the case of audio based stuttering/disfluency detection, this has not been studied and analysed deeply [49] .…”
Section: B Data Augmentationmentioning
confidence: 99%
“…Therefore, the pre-trained models can serve as powerful feature extractors in detection systems based on the two-stage pipeline architecture [11]. The main advantage of pre-trained models is that they can be easily fine-tuned using small amounts of labeled data to achieve state-of-art results in the required task [13]- [16]. When the wav2vec2 model is fine-tuned on a specific task, the model is capable of using its knowledge of general characteristics of speech that it has learned by seeing a large amount of speech data in the pre-training phase.…”
Section: Introductionmentioning
confidence: 99%
“…The organisers of this year's competition presented several solutions as baselines such as DeepSpectrum [2], AuDeep [1,8] and the Com-ParE Acoustic Feature Set. Lastly, the popular pre-trained wav2vec2 model [3], which has exhibited remarkable results in various paralinguistic domains [9,11,17,23,25], was also employed as a baseline.…”
Section: Introductionmentioning
confidence: 99%