Deep Learning for Emotional Speech Recognition

Sánchez-Gutiérrez, Máximo Eduardo; Albornoz, Enrique Marcelo; Licona, Fabiola Martínez; Rufiner, Hugo Leonardo; Goddard, John C.

doi:10.1007/978-3-319-07491-7_32

Cited by 22 publications

(11 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Deep Neural Networks (DNNs) are large scale version of feedforward neural networks that have been successfully validated for learning the complex functional relations. Use of DNN for detecting emotions from speech and music have been well accounted in literature [35]. Each feature dimension was normalized to have zero mean and unit variance before feeding it to DNN.…”

Section: Resultsmentioning

confidence: 99%

BigEAR: Inferring the Ambient and Emotional Correlates from Smartphone-Based Acoustic Big Data

Dubey

Mehl

Mankodiya

2016

2016 IEEE First International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE)

View full text Add to dashboard Cite

Abstract-This paper presents a novel BigEAR big data framework that employs psychological audio processing chain (PAPC) to process smartphone-based acoustic big data collected when the user performs social conversations in naturalistic scenarios. The overarching goal of BigEAR is to identify moods of the wearer from various activities such as laughing, singing, crying, arguing, and sighing. These annotations are based on ground truth relevant for psychologists who intend to monitor/infer the social context of individuals coping with breast cancer. We pursued a case study on couples coping with breast cancer to know how the conversations affect emotional and social well being. In the state-of-the-art methods, psychologists and their team have to hear the audio recordings for making these inferences by subjective evaluations that not only are time-consuming and costly, but also demand manual data coding for thousands of audio files. The BigEAR framework automates the audio analysis. We computed the accuracy of BigEAR with respect to the ground truth obtained from a human rater. Our approach yielded overall average accuracy of 88.76% on real-world data from couples coping with breast cancer.

show abstract

Section: Resultsmentioning

confidence: 99%

BigEAR: Inferring the Ambient and Emotional Correlates from Smartphone-Based Acoustic Big Data

Dubey

Mehl

Mankodiya

2016

2016 IEEE First International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE)

View full text Add to dashboard Cite

show abstract

“…In the works of Brueckner et al further related speaker states and traits from the ISCA Interspeech Computational Paralinguistics Challenges have been considered -often outperforming the best results obtained in those [ [3,4,5,6]]. Further examples for emotional speech recognition include [ [7,25,26,1,21,29]]. In a related way, deep learning has also been successfully applied to emotion recognition in music [ [31]].…”

Section: Deep Learningmentioning

confidence: 98%

Deep Learning Our Everyday Emotions

Schuller

2015

Advances in Neural Networks: Computational and Theoretical Issues

View full text Add to dashboard Cite

Abstract. Emotion is omnipresent in our daily lives and has a significant influence on our functional activities. Thus, computer-based recognising and monitoring of affective cues can be of interest such as when interacting with intelligent systems, or for health-care and security reasons. In this light, this short overview focuses on audio/visual and textual cues as input feature modality for automatic emotion recognition. In particular, it shows how these can best be modelled in a Neural Network context. This includes deep learning, and sparse auto-encoders for transfer learning of a compact task and population representation. It further shows avenues towards massively autonomous rich multitasklearning and required confidence estimation as is needed to prepare such technology for real-life application.

show abstract

“…Layered (i.e. stacked) RBMs provide a vetted system for using probabilistic models to infer relationships between features in a variety of fields [11,13,26,27]. RBMs also have an impressive ability to provide contextual inference in noisy datasets, however an alternative is to use Generative Stochastic Networks (GSNs).…”

Section: Future Workmentioning

confidence: 99%

Evaluating Unsupervised Fault Detection in Self-Healing Systems Using Stochastic Primitives

Schneider

Barker

Dobson

2015

EAI Endorsed Transactions on Self-Adaptive Systems

View full text Add to dashboard Cite

Autonomous fault detection represents one approach for reducing operational costs in large-scale computing environments. However, little empirical evidence exists regarding the implementation or comparison of such methodologies, or offers proof that such approaches reduce costs. This paper compares the effectiveness of several types of stochastic primitives using unsupervised learning to heuristically determine the root causes of faults. The results suggest that self-healing systems frameworks leveraging these techniques can reliably and autonomously determine the source of an anomaly within as little as five minutes. This finding lays the foundation for determining the potential these approaches have for reducing operational costs and ultimately concludes with new avenues for exploring anomaly prediction.

show abstract

Deep Learning for Emotional Speech Recognition

Cited by 22 publications

References 16 publications

BigEAR: Inferring the Ambient and Emotional Correlates from Smartphone-Based Acoustic Big Data

BigEAR: Inferring the Ambient and Emotional Correlates from Smartphone-Based Acoustic Big Data

Deep Learning Our Everyday Emotions

Evaluating Unsupervised Fault Detection in Self-Healing Systems Using Stochastic Primitives

Contact Info

Product

Resources

About