2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2018
DOI: 10.23919/apsipa.2018.8659587
|View full text |Cite
|
Sign up to set email alerts
|

Attention Based Fully Convolutional Network for Speech Emotion Recognition

Abstract: Speech emotion recognition is a challenging task for three main reasons: 1) human emotion is abstract, which means it is hard to distinguish; 2) in general, human emotion can only be detected in some specific moments during a long utterance; 3) speech data with emotional labeling is usually limited. In this paper, we present a novel attention based fully convolutional network for speech emotion recognition. We employ fully convolutional network as it is able to handle variable-length speech, free of the demand… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
89
0
3

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 118 publications
(93 citation statements)
references
References 25 publications
1
89
0
3
Order By: Relevance
“…In [17], we proposed a novel attention based fully convolutional neural network for audio emotion recognition. The proposed attention mechanism helps the model focus on the emotion-relevant regions in speech spectrogram.…”
Section: The Proposed Architecturementioning
confidence: 99%
See 4 more Smart Citations
“…In [17], we proposed a novel attention based fully convolutional neural network for audio emotion recognition. The proposed attention mechanism helps the model focus on the emotion-relevant regions in speech spectrogram.…”
Section: The Proposed Architecturementioning
confidence: 99%
“…The typical CNNs, including AlexNet [23], VGGNet [24], and ResNet [25] take a fixed-size input due to the limitation of fully connected layers. Considering the loss of information caused by the fixed-size input, we proposed a fully convolutional network to handle variable-length speech in [17]. In this study, the same is used as audio encoder, which is shown in Fig.…”
Section: A Audio Streammentioning
confidence: 99%
See 3 more Smart Citations