BanglaSER: A speech emotion recognition dataset for the Bangla language

Das, Rakesh Kumar; Islam, Nahidul; Ahmed, Md. Rayhan; Islam, Salekul; Shatabda, Swakkhar; Islam, A.K.M. Muzahidul

doi:10.1016/j.dib.2022.108091

Cited by 14 publications

(7 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Neural network-based DL models have been investigated in recent SER studies. Among different DL models, CNN [1,4], and LSTM network [5] are the base of the proposed SER model. CNN is the most well-known DL architecture motivated by natural creatures' basic visual attention mechanism [4].…”

Section: Conflicts Of Interestmentioning

confidence: 99%

“…The LSTM is a kind of RNN made up of recurrently associated memory blocks, including memory cells with self-connections that record the network's temporal states [5]. It is mainly effective in learning sequential data in the form of time steps.…”

Section: Conflicts Of Interestmentioning

confidence: 99%

“…The efficiency of emotional features obtained from speech significantly impacts SER performance [3]. Various DL models based on neural networks have been investigated for SER [4], which include Deep Belief Networks (DBN) [3], Convolutional Neural Network (CNN) [1,4], Recurrent Neural Network (RNN) [5] and Long Short-Term Memory (LSTM) network [5]. Prominent existing methods employ different feature extraction and signal transformation methods on speech signals, and then DL methods are applied to the transformed signal for emotion classification.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Recognition of Emotion with Intensity from Speech Signal Using 3D Transformed Feature and Deep Learning

et al. 2022

View full text Add to dashboard Cite

Speech Emotion Recognition (SER), the extraction of emotional features with the appropriate classification from speech signals, has recently received attention for its emerging social applications. Emotional intensity (e.g., Normal, Strong) for a particular emotional expression (e.g., Sad, Angry) has a crucial influence on social activities. A person with intense sadness or anger may fall into severe disruptive action, eventually triggering a suicidal or devastating act. However, existing Deep Learning (DL)-based SER models only consider the categorization of emotion, ignoring the respective emotional intensity, despite its utmost importance. In this study, a novel scheme for Recognition of Emotion with Intensity from Speech (REIS) is developed using the DL model by integrating three speech signal transformation methods, namely Mel-frequency Cepstral Coefficient (MFCC), Short-time Fourier Transform (STFT), and Chroma STFT. The integrated 3D form of transformed features from three individual methods is fed into the DL model. Moreover, under the proposed REIS, both the single and cascaded frameworks with DL models are investigated. A DL model consists of a 3D Convolutional Neural Network (CNN), Time Distribution Flatten (TDF) layer, and Bidirectional Long Short-term Memory (Bi-LSTM) network. The 3D CNN block extracts convolved features from 3D transformed speech features. The convolved features were flattened through the TDF layer and fed into Bi-LSTM to classify emotion with intensity in a single DL framework. The 3D transformed feature is first classified into emotion categories in the cascaded DL framework using a DL model. Then, using a different DL model, the intensity level of the identified categories is determined. The proposed REIS has been evaluated on the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) benchmark dataset, and the cascaded DL framework is found to be better than the single DL framework. The proposed REIS method has shown remarkable recognition accuracy, outperforming related existing methods.

show abstract

Section: Conflicts Of Interestmentioning

confidence: 99%

Section: Conflicts Of Interestmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Recognition of Emotion with Intensity from Speech Signal Using 3D Transformed Feature and Deep Learning

et al. 2022

View full text Add to dashboard Cite

show abstract

“…As mentioned, the dataset is a Bengali dataset. With the advancement of natural language processing, some other research has been conducted on Bengali datasets for NLP purposes [4] .…”

Section: Data Descriptionmentioning

confidence: 99%

A Bengali news and public opinion dataset from YouTube

Chowdhury,

Islam,

Shatabda

2024

Data in Brief

View full text Add to dashboard Cite

“…Except laboratory curated ones, material sources of EmoFilm [6] , VESUS [7] and EmoSpeech [8] are film or in wild. Only a few laboratory curated datasets are available for Bangla language, such as SUBESCO [9] and BanglaSER [10] . As an example, the 7000 samples of popular Bangla SUBESCO [9] dataset are developed with only 10 speech dialogs repeatedly reading by 20 actors.…”

Section: Data Descriptionmentioning

confidence: 99%

KBES: A dataset for realistic Bangla speech emotion recognition with intensity level

Billah,

Sarker,

Akhand

2023

Data in Brief

View full text Add to dashboard Cite

BanglaSER: A speech emotion recognition dataset for the Bangla language

Cited by 14 publications

References 15 publications

Recognition of Emotion with Intensity from Speech Signal Using 3D Transformed Feature and Deep Learning

Recognition of Emotion with Intensity from Speech Signal Using 3D Transformed Feature and Deep Learning

A Bengali news and public opinion dataset from YouTube

KBES: A dataset for realistic Bangla speech emotion recognition with intensity level

Contact Info

Product

Resources

About