2020
DOI: 10.1016/j.patcog.2020.107231
|View full text |Cite
|
Sign up to set email alerts
|

Synthesizing Talking Faces from Text and Audio: An Autoencoder and Sequence-to-Sequence Convolutional Neural Network

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(3 citation statements)
references
References 30 publications
0
3
0
Order By: Relevance
“…Therefore, many researchers have also started to investigate the use of deep convolutional neural networks to extract features for human pose estimation. In recent times, several technical solutions have achieved good performance [13]. A typical example is the OpenPose scheme based on deep convolutional neural networks developed by CMU.…”
Section: Introductionmentioning
confidence: 99%
“…Therefore, many researchers have also started to investigate the use of deep convolutional neural networks to extract features for human pose estimation. In recent times, several technical solutions have achieved good performance [13]. A typical example is the OpenPose scheme based on deep convolutional neural networks developed by CMU.…”
Section: Introductionmentioning
confidence: 99%
“…Human emotions can be perceived not only through explicit facial expressions [1], voice information [2], or text cues [3], but also through implicit body language, including eye movements [4], body postures [5], and gait traits [6]. Nonverbal communication plays a major role in recent human-robot interaction (HRI) [7].…”
Section: Introductionmentioning
confidence: 99%
“…Face animation synthesis has attracted increasing attention in academic and industrial fields, and is considered essential in the real-life applications of human-computer interaction, online teaching, film making, virtual reality, and computer games, among others [1,2,3]. Traditionally, facial synthesis in computer-generated imagery (CGI) has been performed using face capture methods.…”
Section: Introductionmentioning
confidence: 99%