Deep Learning Techniques for Speech Emotion Recognition, from Databases to Models

Abbaschian, Babak Joze; Sierra-Sosa, Daniel; Elmaghraby, Adel

doi:10.3390/s21041249

Cited by 192 publications

(73 citation statements)

References 85 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The aforementioned observation was proven by several researchers [3][4][5][6][7][8][9][10][11][12][13][14]. DL is beneficial in other fields, including target recognition [15], speech recognition [16,17], image recognition [18][19][20], image restoration [21][22][23], audio classification [24,25], object detection [26][27][28][29][30], scene recognition [31], etc., but it has been considered "bad news" in text-based CAPTCHAs, by penetrating their security and making them vulnerable.…”

Section: Introductionmentioning

confidence: 98%

Securing IoT Devices: A Robust and Efficient Deep Learning with a Mixed Batch Adversarial Generation Process for CAPTCHA Security Verification

Dankwa

Yang

2021

Electronics

View full text Add to dashboard Cite

The Internet of Things environment (e.g., smart phones, smart televisions, and smart watches) ensures that the end user experience is easy, by connecting lives on web services via the internet. Integrating Internet of Things devices poses ethical risks related to data security, privacy, reliability and management, data mining, and knowledge exchange. An adversarial machine learning attack is a good practice to adopt, to strengthen the security of text-based CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart), to withstand against malicious attacks from computer hackers, to protect Internet of Things devices and the end user’s privacy. The goal of this current study is to perform security vulnerability verification on adversarial text-based CAPTCHA, based on attacker–defender scenarios. Therefore, this study proposed computation-efficient deep learning with a mixed batch adversarial generation process model, which attempted to break the transferability attack, and mitigate the problem of catastrophic forgetting in the context of adversarial attack defense. After performing K-fold cross-validation, experimental results showed that the proposed defense model achieved mean accuracies in the range of 82–84% among three gradient-based adversarial attack datasets.

show abstract

Section: Introductionmentioning

confidence: 98%

Securing IoT Devices: A Robust and Efficient Deep Learning with a Mixed Batch Adversarial Generation Process for CAPTCHA Security Verification

Dankwa

Yang

2021

Electronics

View full text Add to dashboard Cite

show abstract

“…In [ 39 ], a CNN-based architecture was tested on the acted emotional speech dynamic database (AESDD) by analyzing the sequential time frames and its performance surpassed some existing techniques, which rely on handcrafted features. Very recently, Abbaschian et al [ 40 ] comprehensively and systematically reviewed major deep learning approaches employed in the speech emotion recognition research, combined with their associated speech databases. This review indicates that CNN-based approaches play an important role in speech emotion recognition.…”

Section: Introductionmentioning

confidence: 99%

Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions

Nam

Lee

2021

Sensors

View full text Add to dashboard Cite

Convolutional neural networks (CNNs) are a state-of-the-art technique for speech emotion recognition. However, CNNs have mostly been applied to noise-free emotional speech data, and limited evidence is available for their applicability in emotional speech denoising. In this study, a cascaded denoising CNN (DnCNN)–CNN architecture is proposed to classify emotions from Korean and German speech in noisy conditions. The proposed architecture consists of two stages. In the first stage, the DnCNN exploits the concept of residual learning to perform denoising; in the second stage, the CNN performs the classification. The classification results for real datasets show that the DnCNN–CNN outperforms the baseline CNN in overall accuracy for both languages. For Korean speech, the DnCNN–CNN achieves an accuracy of 95.8%, whereas the accuracy of the CNN is marginally lower (93.6%). For German speech, the DnCNN–CNN has an overall accuracy of 59.3–76.6%, whereas the CNN has an overall accuracy of 39.4–58.1%. These results demonstrate the feasibility of applying the DnCNN with residual learning to speech denoising and the effectiveness of the CNN-based approach in speech emotion recognition. Our findings provide new insights into speech emotion recognition in adverse conditions and have implications for language-universal speech emotion recognition.

show abstract

“…While affective computing has received more attention in humans due to its widespread implications for e.g. neuromarketing ( 5 ), entertainment ( 5 ), monitoring mental health ( 6 , 7 ) and human-robot interactions ( 8 ), the application of affective computing on farm animal welfare is however in its infancy and therefore, more multidisciplinary and exploratory studies are needed to further develop this highly promising field.…”

Section: Introductionmentioning

confidence: 99%

“…Human affective research is only just starting to understand the importance of multimodal approaches, and this knowledge should be used to develop models for non-human animals, too. Research on humans have resulted in complex systems that allow accurate sentiment analysis, preference detection ( 5 ) using qualitative and quantitative data such as facial expressions, body gestures, phonetic and acoustic properties of spoken language, word use and grammar in written text and more ( 8 , 68 ). Different algorithms have been designed using computational methodologies such as hidden Markov chains, Bayesian networks and Gaussian mixture models to e.g.…”

Section: Introductionmentioning

confidence: 99%

The Use of Artificial Intelligence in Assessing Affective States in Livestock

Neethirajan

2021

Front. Vet. Sci.

View full text Add to dashboard Cite

In order to promote the welfare of farm animals, there is a need to be able to recognize, register and monitor their affective states. Numerous studies show that just like humans, non-human animals are able to feel pain, fear and joy amongst other emotions, too. While behaviorally testing individual animals to identify positive or negative states is a time and labor consuming task to complete, artificial intelligence and machine learning open up a whole new field of science to automatize emotion recognition in production animals. By using sensors and monitoring indirect measures of changes in affective states, self-learning computational mechanisms will allow an effective categorization of emotions and consequently can help farmers to respond accordingly. Not only will this possibility be an efficient method to improve animal welfare, but early detection of stress and fear can also improve productivity and reduce the need for veterinary assistance on the farm. Whereas affective computing in human research has received increasing attention, the knowledge gained on human emotions is yet to be applied to non-human animals. Therefore, a multidisciplinary approach should be taken to combine fields such as affective computing, bioengineering and applied ethology in order to address the current theoretical and practical obstacles that are yet to be overcome.

show abstract

Deep Learning Techniques for Speech Emotion Recognition, from Databases to Models

Cited by 192 publications

References 85 publications

Securing IoT Devices: A Robust and Efficient Deep Learning with a Mixed Batch Adversarial Generation Process for CAPTCHA Security Verification

Securing IoT Devices: A Robust and Efficient Deep Learning with a Mixed Batch Adversarial Generation Process for CAPTCHA Security Verification

Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions

The Use of Artificial Intelligence in Assessing Affective States in Livestock

Contact Info

Product

Resources

About