Evaluation of an Arabic Speech Corpus of Emotions: A Perceptual and Statistical Analysis

Meftah, Ali H.; Alotaibi, Yousef Ajami; Selouani, Sid‐Ahmed

doi:10.1109/access.2018.2881096

Cited by 28 publications

(19 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…From the results, one can see that the average accuracy is slightly better for males at 98.04% compared to females at 96.77%. This result is similar to those presented in our previous work [49]. Table 7 presents the normalized confusion matrix for English using the EPST corpus, which includes five females and three males (CC, MF, and CL).…”

Section: Resultssupporting

confidence: 87%

See 1 more Smart Citation

Speaker Identification in Different Emotional States in Arabic and English

et al. 2020

Self Cite

View full text Add to dashboard Cite

Speaker recognition is an important application of digital speech processing. However, a major challenge degrading the robustness of speaker-recognition systems is variation in the emotional states of speakers, such as happiness, anger, sadness, or surprise. In this paper, we propose a speaker recognition system corresponding to three states, namely emotional, neutral, and with no consideration for a speaker's state (i.e., the speaker can be in an emotional state or neutral state), for two languages: Arabic and English. Additionally, cross-language speaker recognition was applied in emotional, neutral, and (emotional + neutral) states. Convolutional neural network and long short-term memory models were used to design a convolutional recurrent neural network (CRNN) main system. We also investigated the use of linearly spaced spectrograms as speech-feature inputs. The proposed system utilizes the KSUEmotions, emotional prosody speech and transcripts, WEST POINT, and TIMIT corpora. The CRNN system exhibited accuracies as high as 97.4% and 97.18% for Arabic and English emotional speech inputs, respectively, and 99.89% and 99.4% for Arabic and English neutral speech inputs, respectively. For the cross-language program, the overall CRNN system accuracy was as high as 91.83%, 99.88%, and 95.36% for emotional, neutral, and (emotional + neutral) states, respectively.

show abstract

Section: Resultssupporting

confidence: 87%

“…The sampling rate was set to 16,000 Hz. For evaluation purposes, a blind human perceptual test was conducted in both phases [49]. In Phase 1, the total duration of recording was 2 h and 55 min.…”

Section: Selected Speech Corpora a Ksu Emotions Corpusmentioning

confidence: 99%

Speaker Identification in Different Emotional States in Arabic and English

et al. 2020

Self Cite

View full text Add to dashboard Cite

show abstract

“…Overall raw hit rate for Phase 1 was 71%, and for Phase 2, it was 80%. If we look at the perceived hit rates for other relevant audio-only datasets including: Arabic database: 80% for 800 sentences [ 21 ], EMOVO: 80% for 588 files [ 13 ], German database: 85% for 800 sentences [ 17 ], MES-P: 86.54% for 5376 stimuli [ 23 ], Indonesian speech corpus: 62% for 1357 audios [ 24 ], Montreal affective voices: 69% for 90 stimuli [ 54 ], Portuguese dataset: 75% for 190 sentences [ 55 ], RAVDESS: 62.5% for 1440 audio-only speech [ 17 ]; these results confirm that the perceptual hit rate of SUBESCO was comparable to existing emotional speech sets. Unbiased hit rates were also reported along with raw hit rates to address false alarms.…”

Section: Discussionmentioning

confidence: 99%

SUST Bangla Emotional Speech Corpus (SUBESCO): An audio-only emotional speech corpus for Bangla

et al. 2021

View full text Add to dashboard Cite

SUBESCO is an audio-only emotional speech corpus for Bangla language. The total duration of the corpus is in excess of 7 hours containing 7000 utterances, and it is the largest emotional speech corpus available for this language. Twenty native speakers participated in the gender-balanced set, each recording of 10 sentences simulating seven targeted emotions. Fifty university students participated in the evaluation of this corpus. Each audio clip of this corpus, except those of Disgust emotion, was validated four times by male and female raters. Raw hit rates and unbiased rates were calculated producing scores above chance level of responses. Overall recognition rate was reported to be above 70% for human perception tests. Kappa statistics and intra-class correlation coefficient scores indicated high-level of inter-rater reliability and consistency of this corpus evaluation. SUBESCO is an Open Access database, licensed under Creative Common Attribution 4.0 International, and can be downloaded free of charge from the web link: https://doi.org/10.5281/zenodo.4526477.

show abstract

“…Since the last decade, deep learning has arisen as a new attractive area of machine learning, and ever since has been examined and utilized in a range of different research topics [1]. Deep learning consists of a multiple of machine learning algorithms fed with inputs in the form of multiple layered models.…”

Section: Introductionmentioning

confidence: 99%

Speech Recognition Using Deep Neural Networks: A Systematic Review

et al. 2019

View full text Add to dashboard Cite

Over the past decades, a tremendous amount of research has been done on the use of machine learning for speech processing applications, especially speech recognition. However, in the past few years, research has focused on utilizing deep learning for speech-related applications. This new area of machine learning has yielded far better results when compared to others in a variety of applications including speech, and thus became a very attractive area of research. This paper provides a thorough examination of the different studies that have been conducted since 2006, when deep learning first arose as a new area of machine learning, for speech applications. A thorough statistical analysis is provided in this review which was conducted by extracting specific information from 174 papers published between the years 2006 and 2018. The results provided in this paper shed light on the trends of research in this area as well as bring focus to new research topics. INDEX TERMS Speech recognition, deep neural network, systematic review.

show abstract

Evaluation of an Arabic Speech Corpus of Emotions: A Perceptual and Statistical Analysis

Cited by 28 publications

References 20 publications

Speaker Identification in Different Emotional States in Arabic and English

Speaker Identification in Different Emotional States in Arabic and English

SUST Bangla Emotional Speech Corpus (SUBESCO): An audio-only emotional speech corpus for Bangla

Speech Recognition Using Deep Neural Networks: A Systematic Review

Contact Info

Product

Resources

About