Improving Speech Emotion Recognition via Fine-tuning ASR with Speaker Information

Ta, Bao Thang; Nguyễn, Tùng Lâm; Dang, Dinh Son; Le, Nhat Minh; Hai, Van

doi:10.23919/apsipaasc55919.2022.9980214

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2023

2024

Publication Types

Select...

Book1

Other1

Article1

Relationship

Self Cite0

Independent3

Authors

Journals

Cited by 3 publications

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

A Gaussian Distribution Labeling Method for Speech Quality Assessment

Le,

Ta,

et al. 2024

Lecture Notes in Computer Science

View full text Add to dashboard Cite

A Gaussian Distribution Labeling Method for Speech Quality Assessment

Le,

Ta,

et al. 2024

Lecture Notes in Computer Science

View full text Add to dashboard Cite

An Automatic Pipeline For Building Emotional Speech Dataset

Thi,

Thang Ta,

et al. 2023

2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

View full text Add to dashboard Cite

Privacy against Real-Time Speech Emotion Detection via Acoustic Adversarial Evasion of Machine Learning

Testa,

Xiao,

Sharma

et al. 2023

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

View full text Add to dashboard Cite

Smart speaker voice assistants (VAs) such as Amazon Echo and Google Home have been widely adopted due to their seamless integration with smart home devices and the Internet of Things (IoT) technologies. These VA services raise privacy concerns, especially due to their access to our speech. This work considers one such use case: the unaccountable and unauthorized surveillance of a user's emotion via speech emotion recognition (SER). This paper presents DARE-GP, a solution that creates additive noise to mask users' emotional information while preserving the transcription-relevant portions of their speech. DARE-GP does this by using a constrained genetic programming approach to learn the spectral frequency traits that depict target users' emotional content, and then generating a universal adversarial audio perturbation that provides this privacy protection. Unlike existing works, DARE-GP provides: a) real-time protection of previously unheard utterances, b) against previously unseen black-box SER classifiers, c) while protecting speech transcription, and d) does so in a realistic, acoustic environment. Further, this evasion is robust against defenses employed by a knowledgeable adversary. The evaluations in this work culminate with acoustic evaluations against two off-the-shelf commercial smart speakers using a small-form-factor (raspberry pi) integrated with a wake-word system to evaluate the efficacy of its real-world, real-time deployment.

show abstract

Improving Speech Emotion Recognition via Fine-tuning ASR with Speaker Information

Cited by 3 publications

References 32 publications

A Gaussian Distribution Labeling Method for Speech Quality Assessment

A Gaussian Distribution Labeling Method for Speech Quality Assessment

An Automatic Pipeline For Building Emotional Speech Dataset

Privacy against Real-Time Speech Emotion Detection via Acoustic Adversarial Evasion of Machine Learning

Contact Info

Product

Resources

About