Emotion recognition and adaptation in spoken dialogue systems

Pittermann, Johannes; Pittermann, Angela; Minker, Wolfgang

doi:10.1007/s10772-010-9068-y

Cited by 56 publications

(41 citation statements)

References 20 publications

(13 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…An emotion recognizer that may operate jointly with an automatic speech recognizer is examined by Pitterman et al [46]. The feature vector comprises of MFCCs (along with their first-and second-order differences), intensity, and three formants, along with pitch and pitch statistics, namely minimum, mean, maximum, deviation and range.…”

Section: Emotion Recognition On Emodbmentioning

confidence: 99%

“…No feature selection technique is applied, while the HMMs are employed as classifiers to a speaker-dependent protocol, contrary to our approach that applies feature selection and a speaker-independent protocol. Also, speech recognition is not a prerequisite in this work, whereas the stated set of features in [46] is a subset of the feature vector computed by the authors. However, in both cases the authors compute the first-and second-differences of the features in order to capture their temporal evolution.…”

Section: Emotion Recognition On Emodbmentioning

confidence: 99%

“…In [13], it is stated that emotional signals are chaotic in nature whereas it is a fact that researchers have not yet found a sufficient feature set to describe the emotional states efficiently [20]. It is also proven that each speaker expresses his/her emotions in a different manner [20] and that the two genders convey their emotions in profoundly different ways [20] [40] [46] [60] [66]. Furthermore, the same linguistic content may bear different emotional state.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema

Kotti

Paternò

2012

Int J Speech Technol

View full text Add to dashboard Cite

In this paper, a psychologically-inspired binary cascade classification schema is proposed for speech emotion recognition. Performance is enhanced because commonly confused pairs of emotions are distinguishable from one another. Extracted features are related to statistics of pitch, formants, and energy contours, as well as spectrum, cepstrum, perceptual and temporal features, autocorrelation, MPEG-7 descriptors, Fujisaki's model parameters, voice quality, jitter, and shimmer. Selected features are fed as input to K nearest neighborhood classifier and to support vector machines. Two kernels are tested for the latter: linear and Gaussian radial basis function. The recently proposed speaker-independent experimental protocol is tested on the Berlin emotional speech database for each gender separately. The best emotion recognition accuracy, achieved by support vector machines with linear kernel, equals 87.7%, outperforming state-of-the-art approaches. Statistical analysis is first carried out with respect to the classifiers' error rates and then to evaluate the information expressed by the classifiers' confusion matrices.

show abstract

Section: Emotion Recognition On Emodbmentioning

confidence: 99%

Section: Emotion Recognition On Emodbmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema

Kotti

Paternò

2012

Int J Speech Technol

View full text Add to dashboard Cite

show abstract

“…The dialog model proposed by [121] combined three different submodels: an emotional model describing the transitions between user emotional states during the interaction regardless of the data content, a plain dialog model describing the transitions between existing dialog states regardless of the emotions, and a combined model including the dependencies between combined dialog and emotional states. Then, the next dialog state was derived from a combination of the plain dialog model and the combined model.…”

Section: Modeling the User Emotional Statementioning

confidence: 99%

“…In our proposal, we employ statistical techniques for inferring user acts, which makes it easier to port it to different application domains. Also the proposed architecture is modular and thus makes it possible to employ different emotion and intention recognizers, as the intention recognizer is not linked to the dialog manager as in [121].…”

Section: Modeling the User Emotional Statementioning

confidence: 99%

Modeling the user state for context-aware spoken interaction in ambient assisted living

2014

View full text Add to dashboard Cite

Ambient Assisted Living (AAL) systems must provide adapted services easily accessible by a wide variety of users. This can only be possible if the communication between the user and the system is carried out through an interface that is simple, rapid, effective, and robust. Natural language interfaces such as dialog systems fulfil these requisites, as they are based on a spoken conversation that resembles human communication. In this paper, we enhance systems interacting in AAL domains by means of incorporating context-aware conversational agents that consider the external context of the interaction and predict the user's state. The user's state is built on the basis of their emotional state and intention, and it is recognized by means of a module conceived as an intermediate phase between natural language understanding and dialog management in the architecture of the conversational agent. This prediction, carried out for each user turn in the dialog, makes it possible to adapt the system dynamically to the user's needs. We have evaluated our proposal developing a context-aware system adapted to patients suffering from chronic pulmonary diseases, and provide a detailed discussion of the positive influenc of our proposal in the success of the interaction, the information and services provided, as well as the perceived quality.

show abstract

TFSM‐based dialogue management model framework for affective dialogue systems

Ren

Wang

Quan

2015

IEEJ Transactions Elec Engng

View full text Add to dashboard Cite

A new dialogue management model for affective dialogue system, which aims to provide a service of information inquiry and affective interaction, is proposed in this paper. First, we construct two finite state machines (TFSM) to model the user and the system, respectively, and simulate the dialogue process as an information exchange between the two state machines. All possible state transitions in dialogue and its probabilities of the user are summarized as a user model, which is helpful for the system to inference and predict the user's internal states. Second, we further discuss the implementation methods of information inquiry and emotional response modules. Finally, we employ the return function of partially observable Markov decision processes (POMDP) model to analyze and evaluate the TFSM‐based dialogue management model. The experimental results not only show the relationships between the average returns, recognition error rates, and state transition probabilities but also confirm that our TFSM‐based dialogue management model outperforms the conventional FSM model. © 2015 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.

show abstract

Emotion recognition and adaptation in spoken dialogue systems

Cited by 56 publications

References 20 publications

Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema

Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema

Modeling the user state for context-aware spoken interaction in ambient assisted living

TFSM‐based dialogue management model framework for affective dialogue systems

Contact Info

Product

Resources

About