2008
DOI: 10.1007/s12193-009-0015-7
|View full text |Cite
|
Sign up to set email alerts
|

Speech driven realistic mouth animation based on multi-modal unit selection

Abstract: This paper presents a novel audio visual diviseme (viseme pair) instance selection and concatenation method for speech driven photo realistic mouth animation. Firstly, an audio visual diviseme database is built consisting of the audio feature sequences, intensity sequences and visual feature sequences of the instances. In the Viterbi based diviseme instance selection, we set the accumulative cost as the weighted sum of three items: 1) logarithm of concatenation smoothness of the synthesized mouth trajectory; 2… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
8
0

Year Published

2010
2010
2015
2015

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 30 publications
0
8
0
Order By: Relevance
“…These mouth images are then used to construct an audio visual diviseme (viseme pair) unit database. For an input audio speech, the multi-modal diviseme unit selection method [8] is adopted by considering the smoothness of the synthesized mouth movements, as well as the similarity of intensity and pronunciation between the input speech and the diviseme unit. The mouth image sequences of the selected diviseme units are then time warped and concatenated.…”
Section: Morphing Of the Background Face Imagesmentioning
confidence: 99%
See 4 more Smart Citations
“…These mouth images are then used to construct an audio visual diviseme (viseme pair) unit database. For an input audio speech, the multi-modal diviseme unit selection method [8] is adopted by considering the smoothness of the synthesized mouth movements, as well as the similarity of intensity and pronunciation between the input speech and the diviseme unit. The mouth image sequences of the selected diviseme units are then time warped and concatenated.…”
Section: Morphing Of the Background Face Imagesmentioning
confidence: 99%
“…According to the underlying face model, these systems can be categorized into 3D-model-based animation or image-based rendering systems. While 3D graphical models [1,2] provide parametric control but lack of video realism, image-based approaches [6][7][8] have the potential of achieving very high levels of video realism.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations