Investigating fuzzy-input fuzzy-output support vector machines for robust voice quality classification

Scherer, Stefan; Kane, John; Gobl, Christer; Schwenker, Friedhelm

doi:10.1016/j.csl.2012.06.001

Cited by 60 publications

(26 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…We used various voice quality descriptors including normalized amplitude quotient (NAQ), parabolic spectral parameter (PSP), maxima dispersion quotient (MDQ), quasi-open quotient (QOQ), difference between the first two harmonics (H1-H2), and peak-slope. For more details, readers are referred to [16,17,37].…”

Section: Acoustic Descriptorsmentioning

confidence: 99%

Computational Analysis of Persuasiveness in Social Multimedia

Park

Shim

Chatterjee

et al. 2014

Proceedings of the 16th International Conference on Multimodal Interaction

View full text Add to dashboard Cite

Our lives are heavily influenced by persuasive communication, and it is essential in almost any types of social interactions from business negotiation to conversation with our friends and family. With the rapid growth of social multimedia websites, it is becoming ever more important and useful to understand persuasiveness in the context of social multimedia content online. In this paper, we introduce our newly created multimedia corpus of 1,000 movie review videos obtained from a social multimedia website called ExpoTV.com, which will be made freely available to the research community. Our research results presented here revolve around the following 3 main research hypotheses. Firstly, we show that computational descriptors derived from verbal and nonverbal behavior can be predictive of persuasiveness. We further show that combining descriptors from multiple communication modalities (audio, text and visual) improve the prediction performance compared to using those from single modality alone. Secondly, we investigate if having prior knowledge of a speaker expressing a positive or negative opinion helps better predict the speaker's persuasiveness. Lastly, we show that it is possible to make comparable prediction of persuasiveness by only looking at thin slices (shorter time windows) of a speaker's behavior.

show abstract

Section: Acoustic Descriptorsmentioning

confidence: 99%

Computational Analysis of Persuasiveness in Social Multimedia

Park

Shim

Chatterjee

et al. 2014

Proceedings of the 16th International Conference on Multimodal Interaction

View full text Add to dashboard Cite

show abstract

“…The selection and choice of features is motivated mainly by related work and previous research [32,7,33,34]. Further, they have proven to be robust representatives of the targeted prosodic phenomena.…”

Section: Audio Featuresmentioning

confidence: 99%

“…In [24] and [7] the importance of voice qualities for emotion recognition are investigated and reported. We chose the peak slope parameter for the representation of breathy to tense voice qualities as it has proven to be very robust and successful in voice quality classification tasks [34]. Lastly, the spectral stationarity measure is used as an indicator for monotonicity in speech which is associated with low activity and negative valence [32].…”

Section: Audio Featuresmentioning

confidence: 99%

Step-wise emotion recognition using concatenated-HMM

Ozkan

Scherer

Morency

2012

Proceedings of the 14th ACM International Conference on Multimodal Interaction

Self Cite

View full text Add to dashboard Cite

Human emotion is an important part of human-human communication, since the emotional state of an individual often affects the way that he/she reacts to others. In this paper, we present a method based on concatenated Hidden Markov Model (co-HMM) to infer the dimensional and continuous emotion labels from audio-visual cues. Our method is based on the assumption that continuous emotion levels can be modeled by a set of discrete values. Based on this, we represent each emotional dimension by step-wise label classes, and learn the intrinsic and extrinsic dynamics using our co-HMM model. We evaluate our approach on the Audio-Visual Emotion Challenge (AVEC 2012) dataset. Our results show considerable improvement over the baseline regression model presented with the AVEC 2012.

show abstract

“…The abbreviation std indicates that the standard deviation of the observed measure was chosen. [27], depression [28] as well as the features' relevance for characterizing voice qualities on a breathy to tense dimension [26,18]. The first three features are derived from the glottal source signal estimated by iterative adaptive inverse filtering (IAIF, [1]).…”

Section: Acoustic Descriptorsmentioning

confidence: 99%

Audiovisual behavior descriptors for depression assessment

Scherer

Stratou

Morency

2013

Proceedings of the 15th ACM on International Conference on Multimodal Interaction

Self Cite

View full text Add to dashboard Cite

We investigate audiovisual indicators, in particular measures of reduced emotional expressivity and psycho-motor retardation, for depression within semi-structured virtual human interviews. Based on a standard self-assessment depression scale we investigate the statistical discriminative strength of the audiovisual features on a depression/no-depression basis. Within subject-independent unimodal and multimodal classification experiments we find that early feature-level fusion yields promising results and confirms the statistical findings. We further correlate the behavior descriptors with the assessed depression severity and find considerable correlation. Lastly, a joint multimodal factor analysis reveals two prominent factors within the data that show both statistical discriminative power as well as strong linear correlation with the depression severity score. These preliminary results based on a standard factor analysis are promising and motivate us to investigate this approach further in the future, while incorporating additional modalities.

show abstract

Investigating fuzzy-input fuzzy-output support vector machines for robust voice quality classification

Cited by 60 publications

References 17 publications

Computational Analysis of Persuasiveness in Social Multimedia

Computational Analysis of Persuasiveness in Social Multimedia

Step-wise emotion recognition using concatenated-HMM

Audiovisual behavior descriptors for depression assessment

Contact Info

Product

Resources

About