A Review of the Advancement in Speech Emotion Recognition for Indo-Aryan and Dravidian Languages

Monisha, Syeda Tamanna Alam; Sultana, Sadia

doi:10.1155/2022/9602429

Cited by 7 publications

(3 citation statements)

References 93 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is analyzed that there is no prior standardized multimodal emotion dataset, which contains recordings of speech and text of people who speak native languages in the Punjabi Language. Figure 2 shows an analysis of research works done for some of the Indian languages in the last two decades 21 .

Figure 2 Analysis of experiments done on Indian languages for speech emotion recognition in the last two decades.…”

Section: Introductionmentioning

confidence: 99%

Emotion recognition for human–computer interaction using high-level descriptors

Singla,

Singh,

Sharma

et al. 2024

Sci Rep

View full text Add to dashboard Cite

Recent research has focused extensively on employing Deep Learning (DL) techniques, particularly Convolutional Neural Networks (CNN), for Speech Emotion Recognition (SER). This study addresses the burgeoning interest in leveraging DL for SER, specifically focusing on Punjabi language speakers. The paper presents a novel approach to constructing and preprocessing a labeled speech corpus using diverse social media sources. By utilizing spectrograms as the primary feature representation, the proposed algorithm effectively learns discriminative patterns for emotion recognition. The method is evaluated on a custom dataset derived from various Punjabi media sources, including films and web series. Results demonstrate that the proposed approach achieves an accuracy of 69%, surpassing traditional methods like decision trees, Naïve Bayes, and random forests, which achieved accuracies of 49%, 52%, and 61% respectively. Thus, the proposed method improves accuracy in recognizing emotions from Punjabi speech signals.

show abstract

Figure 2 Analysis of experiments done on Indian languages for speech emotion recognition in the last two decades.…”

Section: Introductionmentioning

confidence: 99%

Emotion recognition for human–computer interaction using high-level descriptors

Singla,

Singh,

Sharma

et al. 2024

Sci Rep

View full text Add to dashboard Cite

show abstract

“…For instance, speech signals may typically be obtained more quickly and affordably than many other biological signals (such as the EKG). Because of this, most researchers are drawn to speech-emotion recognition (SER) (2) . For the SER system to be successful, the following three challenges must be addressed:…”

Section: Basics Of Emotion Recognitionmentioning

confidence: 99%

“…The result of this achieved an average test accuracy rate of 90%. Some studies are carried out for the development of an automatic SER system for Indo-Aryan and Dravidian languages (2) . This paper presents a brief study of the prominent databases available for SER experiments.…”

Section: Review Of Previous Work On Sermentioning

confidence: 99%

Feature Extraction of Assamese Speech Based One Motion Analysis

Gogoi,

Sharma,

Bordoloi

et al. 2023

IJST

View full text Add to dashboard Cite

Objectives:The present work aims to investigate the recognition of emotion from Assamese speech. Methods: This work presents a method based on the Gaussian Mixture Model (GMM) classifier and Mel-frequency cepstral coefficients (MFCC) as feature extraction technique for emotion recognition from Assamese speeches. Findings: We have conducted experiments considering different emotions: Angry, Happy, Neutral and Sad. The speech emotion recognition system database is the emotional speech samples collected manually from 20 speakers and some standard samples available on the internet. The speakers are from different districts of Assam and use different dialects of the Assamese language, such as Western (Kamrupi), Central, and Eastern. They fall under the age group of 18-26 years. The field survey consists of recordings done at Dibrugarh University and outside the campus. After the GMM training and testing process, the accuracy we obtained is 51.25%. The experiments confirmed that angry and happy emotions have high energy in the higher frequency region. In contrast, neutral and sad emotions have low energy in the higher frequency region. Novelty: This work will help predict the attitudes and actions of different speakers based on their manner of speaking. In addition, the present work will also help in other aspects of human-machine interaction in our daily life. The Assamese emotional speech database used in the work is self-collected from different dialect groups to understand the variability of emotions in dialectal perspective.

show abstract

Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model

Mishra,

Bhatnagar,

et al. 2023

Multimed Tools Appl

View full text Add to dashboard Cite

A Review of the Advancement in Speech Emotion Recognition for Indo-Aryan and Dravidian Languages

Cited by 7 publications

References 93 publications

Emotion recognition for human–computer interaction using high-level descriptors

Emotion recognition for human–computer interaction using high-level descriptors

Feature Extraction of Assamese Speech Based One Motion Analysis

Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model

Contact Info

Product

Resources

About