2006
DOI: 10.1016/j.image.2005.04.002
|View full text |Cite
|
Sign up to set email alerts
|

Partial linear regression for speech-driven talking head application

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2008
2008
2017
2017

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 25 publications
(21 reference statements)
0
3
0
Order By: Relevance
“…Acoustic features used in feature-driven synthesis of visual speech have included Mel frequency cepstral coefficients (MFCCs) [14]- [16], [18]- [20]; filter-bank outputs [6]; line spectral pairs/frequencies (LSPs/LSFs) [17], [40]; formant frequencies [21]; linear prediction coefficients (LPCs) [12], [13] or perceptual LPCs (RASTA-PLP) [11]; and several forms of mapping function have been proposed, including vectorquantisation or a nearest neighbour look up [6]; regression [17], [19]; artificial neural networks [12], [13], [15], [18], [41]; hidden Markov models (HMMs) [11], [42]; and switching linear dynamical systems [14].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Acoustic features used in feature-driven synthesis of visual speech have included Mel frequency cepstral coefficients (MFCCs) [14]- [16], [18]- [20]; filter-bank outputs [6]; line spectral pairs/frequencies (LSPs/LSFs) [17], [40]; formant frequencies [21]; linear prediction coefficients (LPCs) [12], [13] or perceptual LPCs (RASTA-PLP) [11]; and several forms of mapping function have been proposed, including vectorquantisation or a nearest neighbour look up [6]; regression [17], [19]; artificial neural networks [12], [13], [15], [18], [41]; hidden Markov models (HMMs) [11], [42]; and switching linear dynamical systems [14].…”
Section: Related Workmentioning
confidence: 99%
“…One approach for objectively measuring the performance of a synthesizer is to re-synthesize a set of test sentences for which the original visual speech is available and measure the distance between key points located about the face [6], [17], [44], [46], [52], [53], within the parameters used to model the visual speech [20], [23], [43], [54]- [56], or in the image pixels [12]. Although this approach is intuitive and simple to compute, there are two main limitations.…”
Section: A Evaluating Visual Speech Synthesizersmentioning
confidence: 99%
“…Many systems use text and the corresponding phoneme string as input and then use concatenation [1], dominance functions [2] or trajectory generation [3] to produce the desired animation. Other approaches use parameterised speech directly as input and then use formant analysis [4], linear regression [5], or probabilistic modelling [6] [7] to generate the appropriate motion.…”
Section: Introductionmentioning
confidence: 99%