2022
DOI: 10.1109/access.2022.3163856
|View full text |Cite
|
Sign up to set email alerts
|

Hybrid LSTM-Transformer Model for Emotion Recognition From Speech Audio Files

Abstract: Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
2
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 57 publications
(18 citation statements)
references
References 25 publications
0
9
0
Order By: Relevance
“…The convolutional layer‐based transformer is motivated by the success of the transformer and its variants in speech‐processing applications [65–67]. Instead of using a conventional transformer, this study uses convolution layers and multi‐head attention blocks to construct this module.…”
Section: Proposed Ser Systemmentioning
confidence: 99%
“…The convolutional layer‐based transformer is motivated by the success of the transformer and its variants in speech‐processing applications [65–67]. Instead of using a conventional transformer, this study uses convolution layers and multi‐head attention blocks to construct this module.…”
Section: Proposed Ser Systemmentioning
confidence: 99%
“…After training, this paper successfully constructs an audio-modal emotion-recognition model based on the "time-distributed CNNs + LSTMs" scheme and records the detailed parameters of each layer in the model. In the test phase, the performance of the model was evaluated using the RAVDESS dataset; six emotions were classified and predicted; and the "time-distributed CNNs + LSTMs" scheme was combined with the "SVM on global statistical features" [49] program and the "hybrid LSTM-transformer model" [50] in a comparative experiment. The specific effects are shown in Table 7 below.…”
Section: Training and Evaluation Of Audio-modal Emotion-recognition M...mentioning
confidence: 99%
“…In recent studies, transformer-based SER models have been proposed. Andayani et al [36] proposed a hybrid model that replaced the position encodings of the transformer encoder with LSTM in order to learn contextualized longterm dependencies for emotion recognition. Pre-trained models and data augmentation techniques have also been used to improve SER performance in recent research.…”
Section: Related Workmentioning
confidence: 99%