2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2017
DOI: 10.1109/apsipa.2017.8282026
|View full text |Cite
|
Sign up to set email alerts
|

End-to-end speech emotion recognition using multi-scale convolution networks

Abstract: Automatic speech emotion recognition is one of the challenging tasks in machine learning community mainly due to the significant variations across individuals while expressing the same emotion cue. The success of emotion recognition with machine learning techniques primarily depends on the feature set chosen to learn. The formulation of appropriate feature set that cater all the variations in emotion cues however is not a trivial task. Recent works on emotion recognition with deep learning techniques thus focu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 14 publications
0
2
0
1
Order By: Relevance
“…Table 3 presents the performance of the proposed method in comparison with other state-of-the-art proposals for the SAVEE database. Sivanagaraja et al [36] propose a multiscale convolution network (MCNN) for SER using rawWav to train a DNN, which consists of three stages: (i) the signal transformation stage, (ii) the local convolution stage, and (iii) the global convolution stage. Latif et al [37] introduce a deep belief Network (DBN) with three RBM layers using the eGeMAPS features set.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Table 3 presents the performance of the proposed method in comparison with other state-of-the-art proposals for the SAVEE database. Sivanagaraja et al [36] propose a multiscale convolution network (MCNN) for SER using rawWav to train a DNN, which consists of three stages: (i) the signal transformation stage, (ii) the local convolution stage, and (iii) the global convolution stage. Latif et al [37] introduce a deep belief Network (DBN) with three RBM layers using the eGeMAPS features set.…”
Section: Discussionmentioning
confidence: 99%
“…MCNN Sivanagaraja [36] rawWav 50.28 DBN Latif [37] eGeMAPS 56.76 DNN Fayek, Lech and Cavedon [38] Spectrogram 59.7 HMM Chenchah and Lachiri [39] LFCCs/MFCCs 45/61.25 Proposed method MFCCs 74…”
Section: Models Input Features Savee Test Accuracy (%)mentioning
confidence: 99%
“…ASR sistemleri üzerinden sadece konuşma bilgisinin metne dönüştürülmesi değil farklı çalışmalarda gerçekleştirilmiştir. Örneğin, konuşma duygusunun tanımlanması [50], [51], negatif etki ve saldırganlığın otomatik olarak tanımlanmasını sağlayan konuşma analizinin yapılması [52] ve cinsiyet tanınması [53] gibi çalışmalar da mevcuttur. Ayrıca aksan tanıma çalışmaları da ASR sistemlerinin başarımını artırmada önemli rol oynayacağı gibi aynı zamanda konuşmacı hakkında detaylı bilgiler vermektedir [54].…”
Section: Literatür Taraması (Literature Review)unclassified