LSTM Based Cross-corpus and Cross-task Acoustic Emotion Recognition

Kaya, Heysem; Fedotov, Dmitrii; Yesilkanat, Ali; Verkholyak, Oxana; Zhang, Yang; Karpov, Alexey

doi:10.21437/interspeech.2018-2298

Cited by 20 publications

(6 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…[58,59]. Создана система кросс-корпусного распознавания естественных эмоций в речи, основанная на рекуррентных нейронных сетях с долгой кратковременной памятью (LSTM), включающая предобработку признаков, доменную адаптации, обучение и предсказание значений эмоциональных дескрипторов активации и валентности, отличающаяся от аналогов интегральным использованием нескольких корпусов эмоциональной речи для обучения системы на посегментной разметке и ее применения для классификации целых высказываний [60,61]. Предложен метод извлечения геометрических визуальных признаков для описания конфигурации губ на основе 24 пар ключевых точек на компьютерных изображениях губ и рта диктора, что позволяет максимизировать точность отслеживания движений губ дикторов, отличающийся использованием видеозаписей непрерывной русской речи, полученных при помощи высокоскоростной камеры, обеспечивающий повышение точности и робастности аудиовизуального распознавания речи и чтения речи по губам говорящего в реальных условиях функционирования при наличии сильных акустических шумов [62].…”

Section: рм юсупов дв бакурадзе санкт-петербургский институт информат...unclassified

История СПБ ФИЦ РАН: 45 Лет Научной Деятельности

2023

View full text Add to dashboard Cite

Издание посвящается 45-летию Федерального государственного бюджетного учреждения науки «Санкт-Петербургский Федеральный исследовательский центр Российской академии наук», содержит статьи по истории его создания и развития, а также копии ряда информационных и исторических документов.

show abstract

Section: рм юсупов дв бакурадзе санкт-петербургский институт информат...unclassified

История СПБ ФИЦ РАН: 45 Лет Научной Деятельности

2023

View full text Add to dashboard Cite

show abstract

“…hh ∈ R h×h are weight matrix. Then the calculated → h and ← h are connected to obtain the hidden state h t ∈ R n×2h at current time, and the output layer o t ∈ R n×q is calculated by Equation (16).…”

Section: Bi-directional Gated Recurrent Unitmentioning

confidence: 99%

“…In recent years, deep learning has emerged as a prominent alternative to the traditional models, showcasing superior performance across various fields, including the realm of speech emotion recognition. Recent applications of DNN, RNN, CNN, LSTM, and other network models have reaped fruitful outputs in voice emotion recognition [16,17].…”

Section: Introductionmentioning

confidence: 99%

Speech Emotion Recognition Based on Deep Residual Shrinkage Network

et al. 2023

View full text Add to dashboard Cite

Speech emotion recognition (SER) technology is significant for human–computer interaction, and this paper studies the features and modeling of SER. Mel-spectrogram is introduced and utilized as the feature of speech, and the theory and extraction process of mel-spectrogram are presented in detail. A deep residual shrinkage network with bi-directional gated recurrent unit (DRSN-BiGRU) is proposed in this paper, which is composed of convolution network, residual shrinkage network, bi-directional recurrent unit, and fully-connected network. Through the self-attention mechanism, DRSN-BiGRU can automatically ignore noisy information and improve the ability to learn effective features. Network optimization, verification experiment is carried out in three emotional datasets (CASIA, IEMOCAP, and MELD), and the accuracy of DRSN-BiGRU are 86.03%, 86.07%, and 70.57%, respectively. The results are also analyzed and compared with DCNN-LSTM, CNN-BiLSTM, and DRN-BiGRU, which verified the superior performance of DRSN-BiGRU.

show abstract

“…With the recent development of DL algorithms in behavioral signal processing and affective computing, the DL-based emotion recognition algorithms have received significant attention. Notable DL-based emotion recognition approaches include Long-Short Time Memory architectures [12] [13] [14]), deep neural network (DNN) [15], convolutional neural network (CNN) [16] [17] [18], and bidirectional long short-term memory (BLSTM) [19]. Among the DL-based models, CNNs have been shown to be effective in detecting emotions, due to its capability in characterizing local temporal-spectral structures of speech and audio signals, as well as its generalisation ability and recognition accuracy.…”

Section: Introductionmentioning

confidence: 99%

Deep Learning for Audio Visual Emotion Recognition

Hussain

Wang

Bouaynaya

et al. 2022

2022 25th International Conference on Information Fusion (FUSION)

View full text Add to dashboard Cite

Human emotions can be presented in data with multiple modalities, e.g. video, audio and text. An automated system for emotion recognition needs to consider a number of challenging issues, including feature extraction, and dealing with variations and noise in data. Deep learning have been extensively used recently, offering excellent performance in emotion recognition. This work presents a new method based on audio and visual modalities, where visual cues facilitate the detection of the speech or non-speech frames and the emotional state of the speaker. Different from previous works, we propose the use of novel speech features, e.g. the Wavegram, which is extracted with a one-dimensional Convolutional Neural Network (CNN) learned directly from time-domain waveforms, and Wavegram-Logmel features which combines the Wavegram with the log mel spectrogram. The system is then trained in an end-to-end fashion on the SAVEE database by also taking advantage of the correlations among each of the streams. It is shown that the proposed approach outperforms the traditional and state-of-the art deep learning based approaches, built separately on auditory and visual handcrafted features for the prediction of spontaneous and natural emotions.

show abstract

LSTM Based Cross-corpus and Cross-task Acoustic Emotion Recognition

Cited by 20 publications

References 22 publications

История СПБ ФИЦ РАН: 45 Лет Научной Деятельности

История СПБ ФИЦ РАН: 45 Лет Научной Деятельности

Speech Emotion Recognition Based on Deep Residual Shrinkage Network

Deep Learning for Audio Visual Emotion Recognition

Contact Info

Product

Resources

About