Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-2408
|View full text |Cite
|
Sign up to set email alerts
|

A Lightweight Model Based on Separable Convolution for Speech Emotion Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 20 publications
(12 citation statements)
references
References 19 publications
0
8
0
Order By: Relevance
“…Size UA(%) WA(%) F1(%) Han (2014) [2] 12.3 48.20 54.30 -Li (2019) [3] 9.90 67.40 -67.10 Zhong (2020) [4] 0.90 71.72 70.39 70.85 Ours (F-Loss, 7sec) 0.88 70.76 70.23 70.20…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…Size UA(%) WA(%) F1(%) Han (2014) [2] 12.3 48.20 54.30 -Li (2019) [3] 9.90 67.40 -67.10 Zhong (2020) [4] 0.90 71.72 70.39 70.85 Ours (F-Loss, 7sec) 0.88 70.76 70.23 70.20…”
Section: Methodsmentioning
confidence: 99%
“…Size UA(%) WA(%) F1(%) Chen (2018) [5] 323 82.82 --Zhao (2019) [8] 4.34 79.70 --Zhong (2020) [4] 0 present simulation results to compare our model to several benchmarks on the IEMOCAP (scripted+improvised), IEMOCAP (improvised), and EMO-DB datasets in Tables 2, 3, and 4, respectively. As shown in Table 2, our model has slightly less WA, UA, and F1 than those of the Zhong model [4], on the IEMOCAP (scripted+improvised) dataset, which can be attributed to model training using different annotations in addition to the label of each utterance. On the EMO-DB dataset, due to the unavailability of different annotations for training, our model outperforms the Zhong model [4] by more than 2.4% (Table 4).…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…As one of the most important tasks of affective computing, speech emotion recognition (SER) aims to detect the emotional states of speakers, which has a wide range of applications, such as health care systems and human-machine interaction [1]. With the development of deep learning, many studies have employed convolutional neural network (CNN) and recurrent neural network (RNN) based models to generate more discriminative acoustic features for boosting the performance of SER [2,3,4,5,6]. Most of these methods regard static features as the input of network to learn highlevel features.…”
Section: Introductionmentioning
confidence: 99%