Emotional Audio-Visual Speech Synthesis Based on PAD

Jia, Jia; Zhang, Shen; Meng, F.; Wang, Yongxin; Cai, Lianhong

doi:10.1109/tasl.2010.2052246

Cited by 52 publications

(41 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Results show that even though the SVR provides the best performance in the validation of each SSRM for both arousal and valence, the PLS algorithm is more robust to overfitting and thus produces significantly improved performance. Our conclusion is that weak predictors are indeed more suitable to perform boosting than more sophisticated algorithms [50].…”

Section: Comparison Between Pls and Svrmentioning

confidence: 75%

“…Results confirm that the prediction of arousal from acoustic features provides significantly better results than for valence. The combination of weak predictors (PLS) in the CRM, which is similar to a boosting strategy [50], provides a performance that is comparable with the one obtained with more complex machine learning methods that are trained on a full set of speakers [26], [37].…”

Section: Overall Performance Of the Crmmentioning

confidence: 99%

See 1 more Smart Citation

Continuous Estimation of Emotions in Speech by Dynamic Cooperative Speaker Models

Mencattini

Martinelli

Ringeval

et al. 2017

IEEE Trans. Affective Comput.

View full text Add to dashboard Cite

Abstract-Automatic emotion recognition from speech has been recently focused on the prediction of time-continuous dimensions (e.g., arousal and valence) of spontaneous and realistic expressions of emotion, as found in real-life interactions. However, the automatic prediction of such emotions poses several challenges, such as the subjectivity found in the definition of a gold standard from a pool of raters and the issue of data scarcity in training models. In this work, we introduce a novel emotion recognition system, based on ensemble of single-speaker-regression-models (SSRMs). The estimation of emotion is provided by combining a subset of the initial pool of SSRMs selecting those that are most concordance among them. The proposed approach allows the addition or removal of speakers from the ensemble without the necessity to re-build the entire machine learning system. The simplicity of this aggregation strategy, coupled with the flexibility assured by the modular architecture, and the promising results obtained on the RECOLA database highlight the potential implications of the proposed method in a real-life scenario and in particular in WEB-based applications.

show abstract

Section: Comparison Between Pls and Svrmentioning

confidence: 75%

Section: Overall Performance Of the Crmmentioning

confidence: 99%

Continuous Estimation of Emotions in Speech by Dynamic Cooperative Speaker Models

Mencattini

Martinelli

Ringeval

et al. 2017

IEEE Trans. Affective Comput.

View full text Add to dashboard Cite

show abstract

“…However, the method is presented in accordance with the project about the analysis of the net-mediated public sentiment, so the universality of the method is insufficient, and it is not recommended to apply the method to extract information from non-news webpages. We will set out to extract news videos and pictures from news webpages [6] in the future work.…”

Section: Resultsmentioning

confidence: 99%

Extracting News Information Based on Webpage Segmentation and Parsing DOM Tree Reversely

Zhang

2015

Communications in Computer and Information Science

View full text Add to dashboard Cite

Abstract.A new method of extracting news information based on webpage segmentation and parsing DOM tree reversely is presented and implemented in this paper, which intends to effectively extract news information for data mining. The method is proposed to get webpages' main DOM structure by segmenting webpages, further parse the main DOM structure reversely and finally extract news content, headlines, news agents and publication time. The experimental results show that the proposed method has achieved good performance on accuracy and meets the project demands.

show abstract

“…The PAD model essentially shades the modeling of head and facial gestures from the highlevel text semantics, so that we can focus on mapping the PAD descriptors to visual motion features. Toward the PAD parameterization for input text, we adopt the heuristics that are proposed in the PAD based expressive text-to-speech synthesis [24,38]. To extend our approach to talking avatar in other languages, similar PAD parameterizations need to be devised according to the specific language.…”

Section: Discussionmentioning

confidence: 99%

Head and facial gestures synthesis using PAD model for an expressive talking avatar

Jia

Zhang

et al. 2013

Multimed Tools Appl

Self Cite

View full text Add to dashboard Cite

This paper proposes to synthesize expressive head and facial gestures on talking avatar using the three dimensional pleasure-displeasure, arousal-nonarousal and dominancesubmissiveness (PAD) descriptors of semantic expressivity. The PAD model is adopted to bridge the gap between text semantics and visual motion features with three dimensions of pleasure-displeasure, arousal-nonarousal, and dominance-submissiveness. Based on the correlation analysis between PAD annotations and motion patterns derived from the head and facial motion database, we propose to build an explicit mapping from PAD descriptors to facial animation parameters with linear regression and neural networks for head motion and facial expression respectively. A PAD-driven talking avatar in text-to-visual-speech system is implemented by generating expressive head motions at the prosodic word level based on the (P, A) descriptors of lexical appraisal, and facial expressions at the sentence level according to the PAD descriptors of emotional information. A series of PAD reverse evaluation and comparative perceptual experiments shows that the head and facial gestures synthesized based on PAD model can significantly enhance the visual expressivity of talking avatar.

show abstract

Emotional Audio-Visual Speech Synthesis Based on PAD

Cited by 52 publications

References 24 publications

Continuous Estimation of Emotions in Speech by Dynamic Cooperative Speaker Models

Continuous Estimation of Emotions in Speech by Dynamic Cooperative Speaker Models

Extracting News Information Based on Webpage Segmentation and Parsing DOM Tree Reversely

Head and facial gestures synthesis using PAD model for an expressive talking avatar

Contact Info

Product

Resources

About