2018
DOI: 10.1007/978-3-319-93764-9_35
|View full text |Cite
|
Sign up to set email alerts
|

Generating Talking Face Landmarks from Speech

Abstract: The presence of a corresponding talking face has been shown to significantly improve speech intelligibility in noisy conditions and for hearing impaired population. In this paper, we present a system that can generate landmark points of a talking face from an acoustic speech in real time. The system uses a long short-term memory (LSTM) network and is trained on frontal videos of 27 different speakers with automatically extracted face landmarks. After training, it can produce talking face landmarks from the aco… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
15
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
4
2

Relationship

2
8

Authors

Journals

citations
Cited by 43 publications
(15 citation statements)
references
References 20 publications
(23 reference statements)
0
15
0
Order By: Relevance
“…Besides leveraging intermediate landmarks for avoiding directly correlating speech audio with irrelevant visual dynamics, we also propose a novel dynamically adjustable loss along with an attention mechanism to enforce the network to focus on audiovisual-correlated regions. It is worth to mention that in a recent audio-driven facial landmarks generation work [8], such irrelevant visual dynamics are removed in the training process by normalizing and identityremoving the facial landmarks. This has been shown to result in more natural synchronization between generated mouth shapes and speech audio.…”
Section: Introductionmentioning
confidence: 99%
“…Besides leveraging intermediate landmarks for avoiding directly correlating speech audio with irrelevant visual dynamics, we also propose a novel dynamically adjustable loss along with an attention mechanism to enforce the network to focus on audiovisual-correlated regions. It is worth to mention that in a recent audio-driven facial landmarks generation work [8], such irrelevant visual dynamics are removed in the training process by normalizing and identityremoving the facial landmarks. This has been shown to result in more natural synchronization between generated mouth shapes and speech audio.…”
Section: Introductionmentioning
confidence: 99%
“…Suwajanakorn et al [46] proposed an interesting technique to automatically edit a video of a given speaker with accurate lip synchronization guided by his own audio in a different speech. This work has spawned in recent years a number of variant methods on the task [47], [48], [49], [50], [51], [52], [53], [54], [55], [56], [57].…”
Section: Related Workmentioning
confidence: 99%
“…The presence of visual cues improves speech comprehension [1], [2], [3], [4] in noisy environments and for the hardof-hearing population. Consequently, researchers developed systems that can automatically generate talking faces from speech in order to provide the visual cues when they are not available [5], [6], [7], [8], [9], [10], [11], [12]. These systems can increase the accessibility of abundantly available audioonly resources for the hearing impaired population.…”
Section: Introductionmentioning
confidence: 99%