Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1649
|View full text |Cite
|
Sign up to set email alerts
|

Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition

Abstract: Discrete speech emotion recognition (SER), the assignment of a single emotion label to an entire speech utterance, is typically performed as a sequence-to-label task. This approach, however, is limited, in that it can result in models that do not capture temporal changes in the speech signal, including those indicative of a particular emotion. One potential solution to overcome this limitation is to model SER as a sequence-to-sequence task instead. In this regard, we have developed an attention-based bidirecti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
35
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 52 publications
(39 citation statements)
references
References 22 publications
0
35
0
Order By: Relevance
“…It is also worth noting that the data distribution of each emotion class is heavily imbalanced. Therefore, following the approach of [ 50 , 51 ], we merged the happiness and excitement utterances into the happiness class. We used four categories of emotions—namely neutral, happiness, sadness, and anger—for training and evaluation.…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…It is also worth noting that the data distribution of each emotion class is heavily imbalanced. Therefore, following the approach of [ 50 , 51 ], we merged the happiness and excitement utterances into the happiness class. We used four categories of emotions—namely neutral, happiness, sadness, and anger—for training and evaluation.…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…Attention mechanisms have been adopted in several works such as [24], [27], [47] and [22]. Different from our work, [24] investigated learning salient frames through an attentive CNN with multi-view learning objective function.…”
Section: ) Comparison With the State Of The Artmentioning
confidence: 98%
“…parts for the whole utterance. In [47], two attention mechanisms were investigated to learn the emotionally relevant frames based on the BLSTM-CTC framework. One is the component attention, the other is quantum attention.…”
Section: ) Comparison With the State Of The Artmentioning
confidence: 99%
See 1 more Smart Citation
“…Since Google has greatly improved the accuracy of machine translation [24], the attention mechanism is being used more and more in deep learning. In speech recognition, attention mechanism has been used in many tasks such as ASR [25], speaker recognition [26] and SER [2], [20], [21], [27], also in our work.…”
Section: Related Workmentioning
confidence: 99%