2016 IEEE International Conference on Multimedia and Expo (ICME) 2016
DOI: 10.1109/icme.2016.7552890
|View full text |Cite
|
Sign up to set email alerts
|

Inferring users' emotions for human-mobile voice dialogue applications

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
3
3
1

Relationship

3
4

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 15 publications
0
5
0
Order By: Relevance
“…In [22], a CNNs and Gated Recurrent Unit (GRU)-based neural network was proposed for speaker identification and verification. Additionally, LSTM was used in a hybrid emotion inference model that was proposed for inferring user emotion in a real-world voice-dialogue application, and a recurrent autoencoder was proposed to pre-train the LSTM to improve accuracy [32]. Further, GMM and DNNs were combined to identify distant accents in reverberant environments [26].…”
Section: Related Workmentioning
confidence: 99%
“…In [22], a CNNs and Gated Recurrent Unit (GRU)-based neural network was proposed for speaker identification and verification. Additionally, LSTM was used in a hybrid emotion inference model that was proposed for inferring user emotion in a real-world voice-dialogue application, and a recurrent autoencoder was proposed to pre-train the LSTM to improve accuracy [32]. Further, GMM and DNNs were combined to identify distant accents in reverberant environments [26].…”
Section: Related Workmentioning
confidence: 99%
“…Emotion speech recognition is the process of identifying human emotion based on his or her speech [5,6]. Its main task is to analyze human expressions in multiple modalities such as text, speech or video and recognize the underlying emotions [7]. It is usually used in customer service scenarios to evaluate the quality of service (QoS) of agents.…”
Section: Basic Speech Technologymentioning
confidence: 99%
“…[18] performed sentiment analysis on audio data by first transcribing the spoken words and then performing sentiment analysis. Related to audio-based sentiment analysis is the task of estimating emotional state of the speaker from audio input [19]. For the visual modality, the Facial Action Coding System [20] laid the groundwork for analyzing facial expressions and emotions.…”
Section: Related Workmentioning
confidence: 99%