2021
DOI: 10.48550/arxiv.2102.01813
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Speech Emotion Recognition with Multiscale Area Attention and Data Augmentation

Abstract: In Speech Emotion Recognition (SER), emotional characteristics often appear in diverse forms of energy patterns in spectrograms. Typical attention neural network classifiers of SER are usually optimized on a fixed attention granularity. In this paper, we apply multiscale area attention in a deep convolutional neural network to attend emotional characteristics with varied granularities and therefore the classifier can benefit from an ensemble of attentions with different scales. To deal with data sparsity, we c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 20 publications
(20 reference statements)
0
1
0
Order By: Relevance
“…AER has been extensively studied for audio (Xu et al 2021), text (Calefato, Lanubile, and Novielli 2017), facial clues (Chen et al 2016;Luo et al 2017), and EEG-based brain waves (Tripathi et al 2017). Previous studies showed Figure 1: Basic overview of our approach to multimodal emotion recognition.…”
Section: Introductionmentioning
confidence: 99%
“…AER has been extensively studied for audio (Xu et al 2021), text (Calefato, Lanubile, and Novielli 2017), facial clues (Chen et al 2016;Luo et al 2017), and EEG-based brain waves (Tripathi et al 2017). Previous studies showed Figure 1: Basic overview of our approach to multimodal emotion recognition.…”
Section: Introductionmentioning
confidence: 99%