Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets

Noh, Kyoung Ju; Jeong, Chi Yoon; Lim, Jiyoun; Chung, Seungeun; Kim, Gague; Lim, Jeong Mook; Jeong, Hyuntae

doi:10.3390/s21051579

Cited by 15 publications

(4 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Despite the enormous success contributions in emotion recognition in English datasets, there is still gab in Arabic dataset and emotion recognition systems utilizes these Arabic datasets. Some Arabic speeches emotion datasets have been proposed in the literature, see [1]- [3], [5], [19]. Each dataset has a different set of classes or labels, for example, the Arabic audio acted dataset proposed in [20] has five labels (Happiness, Sadness, Neutral, Anger, Fear), and the dataset proposed in [15] has three classes (Happy, Surprised, and Angry), while the dataset proposed in [19] has labels (Happy, Sad, Neutral, Angry, Surprise, Disgust).…”

Section: Arabic Baved Datasetmentioning

confidence: 99%

“…Despite the enormous success contributions in emotion recognition in English datasets, there is still a gab in Arabic dataset and emotion recognition systems utilizes these Arabic datasets. Various Arabic speeches emotion datasets have been proposed in the literature, whether audio or visual, see [1]- [4].…”

Section: Introduction Researchers and Scientists Have Used Deep Learn...mentioning

confidence: 99%

See 1 more Smart Citation

Arabic Speech Emotion Recognition Employing Wav2vec2.0 and HuBERT Based on BAVED Dataset

Mohamed,

Aly

2021

Preprint

View full text Add to dashboard Cite

Recently, there have been tremendous research outcomes in the fields of speech recognition and natural language processing. This is due to the well-developed multilayers deep learning paradigms such as wav2vec2.0, Wav2vecU, WavBERT, and HuBERT that provide better representation learning and high information capturing. Such paradigms run on hundreds of unlabeled data, then fine-tuned on a small dataset for specific tasks. This paper introduces a deep learning constructed emotional recognition model for Arabic speech dialogues. The developed model employs the state of the art audio representations include wav2vec2.0 and HuBERT. The experiment and performance results of our model overcome the previous known outcomes.

show abstract

Section: Arabic Baved Datasetmentioning

confidence: 99%

Section: Introduction Researchers and Scientists Have Used Deep Learn...mentioning

confidence: 99%

Arabic Speech Emotion Recognition Employing Wav2vec2.0 and HuBERT Based on BAVED Dataset

Mohamed,

Aly

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The proposed method achieved the best performance over all other baseline methods. Noh et al [83] proposed a multipath and group-loss-based network (MPGLN), which supports supervised domain adaptation from multiple environments. It is an ensemble learning model based on a temporal feature generator using BiLSTM, a transferred feature extractor from the pretrained VGG-like audio classification model, and simultaneous minimisation of multiple losses.…”

Section: Cross-domain Recognitionmentioning

confidence: 99%

A Review on Speech Emotion Recognition Using Deep Learning and Attention Mechanism

et al. 2021

View full text Add to dashboard Cite

Emotions are an integral part of human interactions and are significant factors in determining user satisfaction or customer opinion. speech emotion recognition (SER) modules also play an important role in the development of human–computer interaction (HCI) applications. A tremendous number of SER systems have been developed over the last decades. Attention-based deep neural networks (DNNs) have been shown as suitable tools for mining information that is unevenly time distributed in multimedia content. The attention mechanism has been recently incorporated in DNN architectures to emphasise also emotional salient information. This paper provides a review of the recent development in SER and also examines the impact of various attention mechanisms on SER performance. Overall comparison of the system accuracies is performed on a widely used IEMOCAP benchmark database.

show abstract

“…This issue also includes a speech-emotion-recognition study [ 6 ] that proposed a multi-path and group-loss-based network (MPGLN) for emotion recognition to support multi-domain adaptation. The authors proposed a model that includes a bidirectional long short-term memory-based temporal feature generator and a transferred feature extractor from the pre-trained VGG-like audio classification model (VGGish).…”

mentioning

confidence: 99%

Special Issue “Emotion Intelligence Based on Smart Sensing”

Park

Whang

2023

Sensors

View full text Add to dashboard Cite

show abstract

Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets

Cited by 15 publications

References 49 publications

Arabic Speech Emotion Recognition Employing Wav2vec2.0 and HuBERT Based on BAVED Dataset

Arabic Speech Emotion Recognition Employing Wav2vec2.0 and HuBERT Based on BAVED Dataset

A Review on Speech Emotion Recognition Using Deep Learning and Attention Mechanism

Special Issue “Emotion Intelligence Based on Smart Sensing”

Contact Info

Product

Resources

About