Synthesizing Expressive Facial and Speech Animation by Text-to-IPA Translation with Emotion Control

Stef, Andreea; Perera, Kaveen; Shum, Hubert P. H.; Ho, Edmond S. L.

doi:10.1109/skima.2018.8631536

Cited by 6 publications

(6 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…hundredths of speech sentences and tenths of emotions captured from people) required for a statistical model [21] or a clustering model [23]. Using the low-dimensional parameter space, compared to the slider-based system [24], our approach requires a considerably smaller number of parameters for emotion controls.…”

Section: Resultsmentioning

confidence: 99%

“…Although their approach could generate an emotional speech animation through a simple interface of emotion control, training a CAT model requires the collection of a speech and video corpus, which is laborious, whenever new emotion is introduced. Stef et al introduced an apporoach that converted a given text into the international phonetic alphabet (IPA), mapped the corresponding lip shape to each symbol, and generated emotional animation by a key-framing technique [24]. Their approach relied on a commerical animation tool and it was necessary to adjust a large number of emotional parameters to create a desired expression.…”

Section: Emotional Speech Animationmentioning

confidence: 99%

See 1 more Smart Citation

Text-driven Speech Animation with Emotion Control

Chae¹,

Kim²

2020

KSII TIIS

View full text Add to dashboard Cite

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Emotional Speech Animationmentioning

confidence: 99%

Text-driven Speech Animation with Emotion Control

Chae¹,

Kim²

2020

KSII TIIS

View full text Add to dashboard Cite

show abstract

“…Similarly, since facial expression and the underlying emotional state of the subject can also affect measurement accuracy, we are interested in normalizing the facial expression. By analyzing 2D [52] and 3D [53] facial information and the associated emotional states, we may also be able to further improve the robustness of the proposed method in the future.…”

Section: Discussionmentioning

confidence: 99%

Assessing Facial Symmetry and Attractiveness using Augmented Reality

Wei

McCay

et al. 2021

Pattern Anal Applic

Self Cite

View full text Add to dashboard Cite

Facial symmetry is a key component in quantifying the perception of beauty. In this paper, we propose a set of facial features computed from facial landmarks which can be extracted at a low computational cost. We quantitatively evaluated the proposed features for predicting perceived attractiveness from human portraits on four benchmark datasets (SCUT-FBP, SCUT-FBP5500, FACES and Chicago Face Database). Experimental results showed that the performance of the proposed features is comparable to those extracted from a set with much denser facial landmarks. The computation of facial features was also implemented as an augmented reality (AR) app developed on Android OS. The app overlays four types of measurements and guidelines over a live video stream, while the facial measurements are computed from the tracked facial landmarks at run time. The developed app can be used to assist plastic surgeons in assessing facial symmetry when planning reconstructive facial surgeries.

show abstract

“…Some of these methods target rigged 3D characters or meshes with prede ined mouth blend shapes that correspond to speech sounds [33,34,35,36,37,38] which have primarily focused on mouth motions only and show a inite number of emotions, blinks, facial action units movements. Realistic Speech-Driven Facial Animation with GANs (RSDGAN) [42] used a GAN-based approach to produce quality videos.…”

Section: Phoneme and Visemes Generation Of Videosmentioning

confidence: 99%

Enhanced shared experiences in heterogeneous network with generative AI

Kumar¹,

Narang²,

Lall³

et al. 2021

ITU-J FET

View full text Add to dashboard Cite

COVID-19 has made the immersive experiences such as video conferencing, virtual reality/augmented reality, the most important modes of exchanging information. Despite much advancement in the network bandwidth and codec techniques, the current system still suffers from glitches, lags and poor video quality, especially under unreliable network conditions. In this paper, we propose the method of a video streaming pipeline to provide better video quality under erratic network conditions. We propose an environment where the participants can interact with each other through video conferencing by only sending the audio in the network. We propose a Multimodal Adaptive Normalization (MAN)-based architecture to synthesize a talking person video of arbitrary length using as input: an audio signal and a single image of a person. The architecture uses multimodal adaptive normalization, keypoint heatmap predictor, optical flow predictor and class activation map-based layers to learn movements of expressive facial components and hence generates a highly expressive talking-head video of the given person. We demonstrate the effectiveness of proposed streaming that dynamically controls the Quality of Experience (QoE) as per the requirements.

show abstract

Synthesizing Expressive Facial and Speech Animation by Text-to-IPA Translation with Emotion Control

Cited by 6 publications

References 16 publications

Text-driven Speech Animation with Emotion Control

Text-driven Speech Animation with Emotion Control

Assessing Facial Symmetry and Attractiveness using Augmented Reality

Enhanced shared experiences in heterogeneous network with generative AI

Contact Info

Product

Resources

About