2021
DOI: 10.21608/ejle.2020.49690.1016
|View full text |Cite
|
Sign up to set email alerts
|

Arabic Automatic Speech Recognition Based on Emotion Detection

Abstract: This work presents a novel emotion recognition via automatic speech recognition (ASR) using a deep feed-forward neural network (DFFNN) for Arabic speech. We present results for the recognition of the three emotions happy, angry, and surprised. The Arabic natural audio dataset (ANAD) is used. Twenty-five low-level descriptors (LLDs) are extracted from the audio signals. Different combination of extracted features is examined. Also, the effect of using the principal component analysis (PCA) technique for dimensi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 33 publications
(45 reference statements)
0
2
0
Order By: Relevance
“…The researchers used the Arabic Natural Audio Dataset (ANAD) , which was previously employed in [31] Novel emotion recognition for Arabic speech using deep feedforward neural network (DFFNN) achieves 98.56% accuracy with PCA and 98.33% with combined features from ANAD dataset. In [39] evaluate three speaker traits-gender, emotion, and dialect-from Arabic speech, employing multitask learning (MTL).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The researchers used the Arabic Natural Audio Dataset (ANAD) , which was previously employed in [31] Novel emotion recognition for Arabic speech using deep feedforward neural network (DFFNN) achieves 98.56% accuracy with PCA and 98.33% with combined features from ANAD dataset. In [39] evaluate three speaker traits-gender, emotion, and dialect-from Arabic speech, employing multitask learning (MTL).…”
Section: Resultsmentioning
confidence: 99%
“…For instance, in [19], [29], and [30], researchers utilized speech, text, and mocap data, including sub-modes such as facial expressions, hand gestures, and head rotations, to accurately identify emotions. Furthermore, [31] introduced a groundbreaking transformer-based model named multimodal transformers for audio-visual emotion recognition, overcoming the limitations of RNN and LSTM in capturing long-term dependencies. Three transformer branches are included in this model: audio-video cross-attention, video selfattention, and audio self-attention.…”
Section: Related Workmentioning
confidence: 99%