Spectral Features for Emotional Speaker Recognition

Sandhya, P.; V, Spoorthy.; Koolagudi, Shashidhar G.; Sobhana, N. V.

doi:10.1109/icaecc50550.2020.9339502

Cited by 21 publications

(13 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This feature [ 24 ] is referred as global energy of audio signal, which is estimated by,

where,

defines signal amplitude at

amplitude,

symbolizes quantity of frames in sample length, and

specifies root mean square feature.…”

Section: Developed Covid-19 Detection Model Based On Hybrid Optimizationmentioning

confidence: 99%

See 1 more Smart Citation

Design and development of hybrid optimization enabled deep learning model for COVID-19 detection with comparative analysis with DCNN, BIAT-GRU, XGBoost

Dar¹,

Srivastava²,

Lone

2022

Computers in Biology and Medicine

View full text Add to dashboard Cite

“…This feature [ 24 ] is referred as global energy of audio signal, which is estimated by,

where,

defines signal amplitude at

amplitude,

symbolizes quantity of frames in sample length, and

specifies root mean square feature.…”

Section: Developed Covid-19 Detection Model Based On Hybrid Optimizationmentioning

confidence: 99%

“…This feature defines the ratio of quantity of times the audio sample alters the value from negative to positive or else positive to negative to frame dimension [ 24 ]. The zero-crossing rate feature is denoted as

.…”

Section: Developed Covid-19 Detection Model Based On Hybrid Optimizationmentioning

confidence: 99%

Design and development of hybrid optimization enabled deep learning model for COVID-19 detection with comparative analysis with DCNN, BIAT-GRU, XGBoost

Dar¹,

Srivastava²,

Lone

2022

Computers in Biology and Medicine

View full text Add to dashboard Cite

“…The input features for deep-learning-based SER models are generally extracted from the time or spectrum axis in units of speech segments or frames. There are various LLDs and high-level statistical functions of the LLD single features [19,20,[31][32][33]. The spectrum LLD features of speech signals include logMel filter-banks and mel-frequency cepstral coefficients (MFCC).…”

Section: Related Workmentioning

confidence: 99%

“…The spectrum LLD features of speech signals include logMel filter-banks and mel-frequency cepstral coefficients (MFCC). Zero-crossing rates and signal energies are representative time-domain features [27][28][29][30], whereas spectral roll-off and spectral centroid are classified as spectral parameters [33]. A set of multiple single features for acoustic signal processing, such as the extended Geneva Minimalistic Acoustic Parameter Set [34] and the INTERSPEECH 2010 Paralinguistic Challenge (IS10) dataset [35], is now accessible from open-source frameworks, such as OpenSmile [36].…”

Section: Related Workmentioning

confidence: 99%

Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets

Noh

Jeong

Lim

et al. 2021

Sensors

View full text Add to dashboard Cite

Speech emotion recognition (SER) is a natural method of recognizing individual emotions in everyday life. To distribute SER models to real-world applications, some key challenges must be overcome, such as the lack of datasets tagged with emotion labels and the weak generalization of the SER model for an unseen target domain. This study proposes a multi-path and group-loss-based network (MPGLN) for SER to support multi-domain adaptation. The proposed model includes a bidirectional long short-term memory-based temporal feature generator and a transferred feature extractor from the pre-trained VGG-like audio classification model (VGGish), and it learns simultaneously based on multiple losses according to the association of emotion labels in the discrete and dimensional models. For the evaluation of the MPGLN SER as applied to multi-cultural domain datasets, the Korean Emotional Speech Database (KESD), including KESDy18 and KESDy19, is constructed, and the English-speaking Interactive Emotional Dyadic Motion Capture database (IEMOCAP) is used. The evaluation of multi-domain adaptation and domain generalization showed 3.7% and 3.5% improvements, respectively, of the F1 score when comparing the performance of MPGLN SER with a baseline SER model that uses a temporal feature generator. We show that the MPGLN SER efficiently supports multi-domain adaptation and reinforces model generalization.

show abstract

“…In this paper [21], the speaker is recognized in the emotional environment. Spectral features are extracted from the data and are classified.…”

Section: Related Workmentioning

confidence: 99%

An Analysis of the Impact of Spectral Contrast Feature in Speech Emotion Recognition

Kumar¹,

Thiruvenkadam²

2021

Int. J. Recent Contrib. Eng. Sci. IT

View full text Add to dashboard Cite

Feature extraction is an integral part in speech emotion recognition. Some emotions become indistinguishable from others due to high resemblance in their features, which results in low prediction accuracy. This paper analyses the impact of spectral contrast feature in increasing the accuracy for such emotions. The RAVDESS dataset has been chosen for this study. The SAVEE dataset, CREMA-D dataset and JL corpus dataset were also used to test its performance over different English accents. In addition to that, EmoDB dataset has been used to study its performance in the German language. The use of spectral contrast feature has increased the prediction accuracy in speech emotion recognition systems to a good degree as it performs well in distinguishing emotions with significant differences in arousal levels, and it has been discussed in detail.<div> </div>

show abstract

Spectral Features for Emotional Speaker Recognition

Cited by 21 publications

References 18 publications

Design and development of hybrid optimization enabled deep learning model for COVID-19 detection with comparative analysis with DCNN, BIAT-GRU, XGBoost

Design and development of hybrid optimization enabled deep learning model for COVID-19 detection with comparative analysis with DCNN, BIAT-GRU, XGBoost

Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets

An Analysis of the Impact of Spectral Contrast Feature in Speech Emotion Recognition

Contact Info

Product

Resources

About