2022
DOI: 10.1109/tvcg.2021.3107669
|View full text |Cite
|
Sign up to set email alerts
|

Geometry-Guided Dense Perspective Network for Speech-Driven Facial Animation

Abstract: Realistic speech-driven 3D facial animation is a challenging problem due to the complex relationship between speech and face. In this paper, we propose a deep architecture, called Geometry-guided Dense Perspective Network (GDPnet), to achieve speaker-independent realistic 3D facial animation. The encoder is designed with dense connections to strengthen feature propagation and encourage the re-use of audio features, and the decoder is integrated with an attention mechanism to adaptively recalibrate point-wise f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 17 publications
(10 citation statements)
references
References 41 publications
0
4
0
Order By: Relevance
“…We compared the results with current state-of-the-art methods [13,15,22] on the VOCASET dataset, directly using the data provided in these papers. Evaluation of mouth synchronization.…”
Section: Experimental Results and Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…We compared the results with current state-of-the-art methods [13,15,22] on the VOCASET dataset, directly using the data provided in these papers. Evaluation of mouth synchronization.…”
Section: Experimental Results and Analysismentioning
confidence: 99%
“…There are several methods [10][11][12] to obtain 3D facial parameter representations from 2D monocular videos, but the quality of the synthesized 3D data receives limitations in the accuracy of 3D reconstruction techniques and 3D reconstruction techniques cannot realize subtle changes in 3D based on 2D videos, so this may lead to unreliable results. In works that generate 3D facial animations based on 3D meshes [13][14][15], they delay the speech input to short audio windows, which may lead to pauses in lip movements with speech changes, which further may affect the realistic facial changes.…”
Section: Introductionmentioning
confidence: 99%
“…The critical contribution of VOCA is that the additional identity control parameters can vary the identity-dependent visual dynamics. Based on VOCA, Liu et al [186] proposed a geometry-guided dense perspective network (GDPnet) with two constraints from different perspectives to achieve a more robust generation. Fan et al [187] proposed a Transformer-based autoregressive VSG model named FaceFormer to encode the long-term audio context information and predict a sequence of 3D face vertices.…”
Section: Vertex Based Methodsmentioning
confidence: 99%
“…However, these methods are not applicable to 3D character models that are widely used in 3D games and virtual reality interactions. Therefore, speech-driven 3D facial animation has attracted more attention recently [2,15,6,41,12,35,23,7,5].…”
Section: Speech-driven 3d Facial Animationmentioning
confidence: 99%