Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings.
DOI: 10.1109/afgr.2004.1301509
|View full text |Cite
|
Sign up to set email alerts
|

Trainable videorealistic speech animation

Abstract: I describe how to create with machine learning techniques a generative, videorealistic, speech animation module. A human subject is first recorded using a videocamera as he/she utters a pre-determined speech corpus. After processing the corpus automatically, a visual speech module is learned from the data that is capable of synthesizing the human subject's mouth uttering entirely novel utterances that were not recorded in the original video. The synthesized utterance is re-composited onto a background sequence… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
86
0
1

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 76 publications
(87 citation statements)
references
References 44 publications
0
86
0
1
Order By: Relevance
“…3. The cluster number is defined empirically, which is comparably less than that used in MMM [1]. The reason is that in facial feature template, the teeth markers are excluded as their appearance could not be traced robustly due to the low quality of video clips.…”
Section: K-means Clustering and 3d Viseme Databasementioning
confidence: 99%
See 3 more Smart Citations
“…3. The cluster number is defined empirically, which is comparably less than that used in MMM [1]. The reason is that in facial feature template, the teeth markers are excluded as their appearance could not be traced robustly due to the low quality of video clips.…”
Section: K-means Clustering and 3d Viseme Databasementioning
confidence: 99%
“…The data analysis may be based on machine learning [1,2,4,6,7] or probabilistic framework [5]. Ezzat et al [1] employ a variant of MMM to synthesize mouth configurations of a novel speech. Cao et al [6] generate a data structure called Anime Graph to encapsulate motion captured facial motion database along with speech information.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…This model may be based on marker point positions [7,9,15,18], 3D scans [3,14,25,28,30] or images [6,12].…”
Section: Introductionmentioning
confidence: 99%