Learning Salient Segments for Speech Emotion Recognition Using Attentive Temporal Pooling

Xia, Xiaohan; Jiang, Dongmei; Sahli, Hichem

doi:10.1109/access.2020.3014733

Cited by 9 publications

(4 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Xia et al [36] suggested that DNN-based SER captures the temporal segment-level aspects of low-level features of voice signals. It used low-level elements of the emotion signal linked to energy, spectral, statistical, and voice.…”

Section: Related Workmentioning

confidence: 99%

Indian Cross Corpus Speech Emotion Recognition Using Multiple Spectral-Temporal-Voice Quality Acoustic Features and Deep Convolution Neural Network

Kawade,

Jagtap

2024

RIA

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

Indian Cross Corpus Speech Emotion Recognition Using Multiple Spectral-Temporal-Voice Quality Acoustic Features and Deep Convolution Neural Network

Kawade,

Jagtap

2024

RIA

View full text Add to dashboard Cite

“…The Gaussian Mixture Model (GMM) and an additional DNN are used to extract emotional saliency weights from condensed representations. Surprisingly, our methodology relies just on utterance-level labels to achieve state-of-the-art SER performance on many public emotion datasets, including as RML, EMO-DB, and IEMOCAP, without requiring supervisory information at the frame or segment level [6].…”

Section: Related Workmentioning

confidence: 99%

Deep Learning-Based Speech Emotion Recognition Using Librosa

D. Lakshmi

2023

IJRITCC

View full text Add to dashboard Cite

Speech Emotion Recognition is a challenge of computational paralinguistic and speech processing that tries to identify and classify the emotions expressed in spoken language. The objective is to infer from a speaker's speech patterns, such as prosody, pitch, and rhythm, their emotional state, such as happiness, rage, sadness, or frustration. In the modern world, one of the most crucial marketing tactics is emotion detection. For a person, you might tailor several things in order to best fit their interests. Due to this, we made the decision to work on a project where we could identify a person's emotions based just on their speech, allowing us to handle a variety of AI-related applications. Examples include the ability of call centers to play music during tense exchanges. Another example might be a smart automobile that slows down when someone is scared or furious. In Python, we processed and extracted features from the audio files using the Librosa module. A Python library for audio and music analysis is called Librosa. It offers the fundamental components required to develop systems for retrieving music-related information. Because of this, there is a lot of potential for this kind of application in the market that would help businesses and ensure customer safety.

show abstract

“…It handles domain mismatch and data disturbances. Smoothing the adversarial model needed a bigger dataset [57]. The phase and loudness of speech reduce the frame clipping impact in SER.…”

Section: Ser Using Machine Learning Based Techniquesmentioning

confidence: 99%

Comprehensive Study of Automatic Speech Emotion Recognition Systems

Kawade,

Jagtap

2023

IJRITCC

View full text Add to dashboard Cite

Speech emotion recognition (SER) is the technology that recognizes psychological characteristics and feelings from the speech signals through techniques and methodologies. SER is challenging because of more considerable variations in different languages arousal and valence levels. Various technical developments in artificial intelligence and signal processing methods have encouraged and made it possible to interpret emotions.SER plays a vital role in remote communication. This paper offers a recent survey of SER using machine learning (ML) and deep learning (DL)-based techniques. It focuses on the various feature representation and classification techniques used for SER. Further, it describes details about databases and evaluation metrics used for speech emotion recognition.

show abstract

Learning Salient Segments for Speech Emotion Recognition Using Attentive Temporal Pooling

Cited by 9 publications

References 44 publications

Indian Cross Corpus Speech Emotion Recognition Using Multiple Spectral-Temporal-Voice Quality Acoustic Features and Deep Convolution Neural Network

Indian Cross Corpus Speech Emotion Recognition Using Multiple Spectral-Temporal-Voice Quality Acoustic Features and Deep Convolution Neural Network

Deep Learning-Based Speech Emotion Recognition Using Librosa

Comprehensive Study of Automatic Speech Emotion Recognition Systems

Contact Info

Product

Resources

About