2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII) 2017
DOI: 10.1109/acii.2017.8273599
|View full text |Cite
|
Sign up to set email alerts
|

Segment-based speech emotion recognition using recurrent neural networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
32
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 58 publications
(36 citation statements)
references
References 18 publications
1
32
0
Order By: Relevance
“…The same applies for LR (in WA from 0.8% to 1.0% and in UA from 0.3% to 1.0% ) as well as for A-BLSTM (in WA from 0.1% to 0.7% and in UA from 0.2% to 0.7%). In accordance with our intuition [8], a segment-based approach using A-BLSTM surpasses all utterance-based ones in WA from 3.4% to 8.4% and in UA from 3.8% to 6.8% for all normalization schemes, when the fused set is used.…”
Section: Leave One Session Out (Loso)supporting
confidence: 86%
See 1 more Smart Citation
“…The same applies for LR (in WA from 0.8% to 1.0% and in UA from 0.3% to 1.0% ) as well as for A-BLSTM (in WA from 0.1% to 0.7% and in UA from 0.2% to 0.7%). In accordance with our intuition [8], a segment-based approach using A-BLSTM surpasses all utterance-based ones in WA from 3.4% to 8.4% and in UA from 3.8% to 6.8% for all normalization schemes, when the fused set is used.…”
Section: Leave One Session Out (Loso)supporting
confidence: 86%
“…Moreover, segmentbased approaches have showcased that computation of statistical functionals over LLDs in appropriate timescales yields a significant performance improvement for SER systems [7], [8]. Specifically, in [8] statistical representations are extracted from overlapping segments, each one corresponding to a couple of words. The resulting sequence of segments representations is fed as input to a Long Short Time Memory (LSTM) unit for SER classification.…”
Section: Introductionmentioning
confidence: 99%
“…The proposed model outperforms the state-ofthe-art models on both the improvised partition and the full IEMOCAP dataset, in terms of WA and UA. 1 Results from [11,12,13,14] were rounded to one decimal digit.…”
Section: Resultsmentioning
confidence: 99%
“…Tzinis and Potamianos [4] run a study on both local and global features and evaluate the performance at various time-scales (frame, phoneme, word or utterance). The result shows that, global statistical feature extracted from speech segment that correspond to the duration of few words yield optimal accuracy using Recurrent Neural Networks (RNNs).…”
Section: Related Workmentioning
confidence: 99%
“…The appropriate time-scale selection is crucial to produce a high performance SER system. Emotional features can be categorized into two types of time scale: 1) Low Level Descriptor (LLDs) known as local features and 2) Statistical function, known as global feature [4]. Local features define the temporal dynamics in the prosody while statistic value such as minimum, maximum, mean, standard deviation, and slope of the contours highlights the global features [5].…”
Section: Introductionmentioning
confidence: 99%