Multichannel CNN-BLSTM Architecture for Speech Emotion Recognition System by Fusion of Magnitude and Phase Spectral Features Using DCCA for Consumer Applications

Prabhakar, Gudmalwar Ashishkumar; Basel, Biplove; Dutta, Anirban; Rao, Ch. V. Rama

doi:10.1109/tce.2023.3236972

Cited by 17 publications

(5 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…3) Several kinds of features commonly extracted [6]: 4) Zero Crossing Rate (ZCR): ZCR indicates the frequency at which a signal crosses the zero-amplitude level, characterizing changes in signal amplitude. 7) Modified Group Delay Function (MODGD) [7]: MODGD extracts phase information from sound signals by calculating the group delay function, offering insights into phase characteristics.…”

Section: Preprocessing and Feature Engineeringmentioning

confidence: 99%

“…2D CNN, involving the transformation of audio signals into spectrogram form, may result in the loss of some temporal information. The CNN-BLSTM architecture [7], capturing local features at different scales, presents challenges due to the need for precise parameter tuning and high computational requirements.…”

Section: Feature Engineeringmentioning

confidence: 99%

“…These are all achieved by establishing a new model, which is based on the old model, or by optimizing the old model. For example, quantifiable recognition of language signals; Proposing an end-to-end model consisting of a convolutional neural network (CNN) for recognizing natural language emotions based on existing deep neural networks (DNNs) [3]; Research on the recognition accuracy of the inherent time relationship of speech waveforms; Optimization of Mel frequency spectrum coefficients (MFCCs) and phase packaging of Mel spectra [7]; The replacement of the old hidden Markov model (HMM) model, and so on.…”

Section: Application Diversitymentioning

confidence: 99%

“…These seven different articles all emphasize the importance of studying how computers recognize human natural language emotions, as well as the significant changes it can bring to various fields. The aspects of natural language emotion recognition they explored are quantifiable recognition of language signal sensors, enabling them to be applied to different aspects such as human restart interaction, virtual reality, behavior evaluation, healthcare, and emergency call centers; Propose an end-to-end model consisting of a convolutional neural network (CNN) for recognizing natural language emotions based on existing deep neural networks (DNNs) [3]; Research on the recognition accuracy of the inherent time relationship of speech waveforms, and a new language recognition method proposed to address the issue of insufficient recognition accuracy [2]; Optimize the phase packaging of Mel frequency spectrum coefficients (MFCCs) and Mel spectrograms to solve problems such as signal processing difficulties and phase information being ignored [7]; Solve the problem of difficulty in conveniently detecting human hidden emotions by establishing a single, standardized formula due to the diversity and complexity of human language [6]; There is also the function of natural language emotion recognition to help children with autism spectrum disorder (ASD) recognize emotions in social interactions and help them overcome the harm caused by ASD [5]; And by building a platform for intelligent home robots, they can achieve simple operations such as cleaning and hygiene by recognizing the natural language of their owners [1].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Advancements and challenges in speech emotion recognition: a comprehensive review

Wang,

Yin,

Zhou

et al. 2024

Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024)

View full text Add to dashboard Cite

As the importance of human-computer interaction (HCI) continues to strengthen and the field of deep learning evolves, numerous models have found their application in the realm of Speech Emotion Recognition (SER), leading to significant advancements in recent years. However, effectively recognizing and processing human emotions through computational systems remains a complex and formidable challenge. This review aims to provide a comprehensive summary of the latest accomplishments in SER, encompassing a diverse range of application scenarios, from education and healthcare to criminal investigation. Additionally, it delves into various models and preprocessing techniques such as Convolutional Neural Networks (CNN), Convolutional Recurrent Neural Networks (CRNN), Long Short-Term Memory (LSTM), and datasets like RAVDESS and RECOLA, which encompass a wide array of scenes and languages. While the recent strides in SER have undeniably achieved impressive accuracy rates, a notable gap exists in research that addresses more intricate emotional contexts, including situations involving irony or sarcasm. Consequently, this review focuses on a comprehensive analysis of the limitations inherent in different feature engineering strategies. Moreover, it investigates the challenge of interpretability posed by complex models, the constraint posed by singular and hard-to-gather datasets, and the expansive scope of potential applications SER could serve. Considering these complexities, a potential pathway to further enhance SER's effectiveness and applicability is proposed. This involves exploring the concept of non-binary emotion classification, harnessing rich contextual information, and integrating datasets that incorporate gesture and textual data. By adapting feature extraction techniques to align with the unique demands of specific scenarios, the performance of SER models could be markedly improved.

show abstract

Section: Preprocessing and Feature Engineeringmentioning

confidence: 99%

Section: Feature Engineeringmentioning

confidence: 99%

Section: Application Diversitymentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Advancements and challenges in speech emotion recognition: a comprehensive review

Wang,

Yin,

Zhou

et al. 2024

Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024)

View full text Add to dashboard Cite

show abstract

“…Prabhakar, Basel, Dutta and Rao [33]. developed a multichannel Convolution Neural Network-Bidirectional Long Short Term Memory (CNN-BLSTM) architecture with an attention mechanism for speaker-independent SER by considering phase and magnitude spectrum-based features.…”

Section: Introductionmentioning

confidence: 99%

Real-Time Speech Emotion Recognition Using Deep Learning and Data Augmentation

Barhoumi¹,

Ayed

2023

Preprint

View full text Add to dashboard Cite

In human-human interactions, detecting emotions is often easy as it can be perceived through facial expressions, body gestures, or speech. However, in human-machine interactions, detecting human emotion can be a challenge. To improve this interaction, the term 'speech emotion recognition' has emerged, with the goal of recognizing emotions solely through vocal intonation. In this work, we propose a speech emotion recognition system based on deep learning approaches and two efficient data augmentation techniques (noise addition and spectrogram shifting). To evaluate the proposed system, we used three different datasets: TESS, EmoDB, and RAVDESS. We employe several algorithms such as Mel Frequency Cepstral Coefficients (MFCC), Zero Crossing Rate (ZCR), Mel spectrograms, Root Mean Square Value (RMS), and chroma to select the most appropriate vocal features that represent speech emotions. To develop our speech emotion recognition system, we use three different deep learning models, including MultiLayer Perceptron (MLP), Convolutional Neural Network (CNN), and a hybrid model that combines CNN with Bidirectional Long-Short Term Memory (Bi-LSTM). By exploring these different approaches, we were able to identify the most effective model for accurately identifying emotional states from speech signals in real-time situation. Overall, our work demonstrates the effectiveness of the proposed deep learning model, specifically based on CNN+BiLSTM, and the used two data augmentation techniques for the proposed real-time speech emotion recognition.

show abstract

Everything you wanted to know about ChatGPT: Components, capabilities, applications, and opportunities

Heidari,

Navimipour,

Zeadally

et al. 2024

Internet Technology Letters

View full text Add to dashboard Cite

Conversational Artificial Intelligence (AI) and Natural Language Processing have advanced significantly with the creation of a Generative Pre‐trained Transformer (ChatGPT) by OpenAI. ChatGPT uses deep learning techniques like transformer architecture and self‐attention mechanisms to replicate human speech and provide coherent and appropriate replies to the situation. The model mainly depends on the patterns discovered in the training data, which might result in incorrect or illogical conclusions. In the context of open‐domain chats, we investigate the components, capabilities constraints, and potential applications of ChatGPT along with future opportunities. We begin by describing the components of ChatGPT followed by a definition of chatbots. We present a new taxonomy to classify them. Our taxonomy includes rule‐based chatbots, retrieval‐based chatbots, generative chatbots, and hybrid chatbots. Next, we describe the capabilities and constraints of ChatGPT. Finally, we present potential applications of ChatGPT and future research opportunities. The results showed that ChatGPT, a transformer‐based chatbot model, utilizes encoders to produce coherent responses.

show abstract

Multichannel CNN-BLSTM Architecture for Speech Emotion Recognition System by Fusion of Magnitude and Phase Spectral Features Using DCCA for Consumer Applications

Cited by 17 publications

References 53 publications

Advancements and challenges in speech emotion recognition: a comprehensive review

Advancements and challenges in speech emotion recognition: a comprehensive review

Real-Time Speech Emotion Recognition Using Deep Learning and Data Augmentation

Everything you wanted to know about ChatGPT: Components, capabilities, applications, and opportunities

Contact Info

Product

Resources

About