2022
DOI: 10.1109/access.2022.3231137
|View full text |Cite
|
Sign up to set email alerts
|

Pose-Aware Speech Driven Facial Landmark Animation Pipeline for Automated Dubbing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(8 citation statements)
references
References 39 publications
0
8
0
Order By: Relevance
“…For a detailed illustration, refer to Fig 4a and Eqn 4. This mechanism is similar to the approaches used in other studies such as [64] and [65], where models were conditioned with information from the previous frame to guarantee temporal consistency in the context of video generation. Essentially, maintaining consistent colors throughout a video sequence becomes more achievable when the model can "remember" the colors from the previous frame.…”
Section: Temporal Consistencymentioning
confidence: 99%
“…For a detailed illustration, refer to Fig 4a and Eqn 4. This mechanism is similar to the approaches used in other studies such as [64] and [65], where models were conditioned with information from the previous frame to guarantee temporal consistency in the context of video generation. Essentially, maintaining consistent colors throughout a video sequence becomes more achievable when the model can "remember" the colors from the previous frame.…”
Section: Temporal Consistencymentioning
confidence: 99%
“…Talking Head refers to generating a computer-generated virtual character or avatar or a static picture that can speak and emulate human-like facial expressions, lip movements, and emotions based on an audio track. This technology combines artificial intelligence, computer vision, and natural language processing to create a lifelike person's digital representation that can deliver speech realistically and expressively [9]- [13].…”
Section: Talking Headmentioning
confidence: 99%
“…By leveraging deep learning algorithms, the system analyzes audio input, interprets the speech content, and generates synchronized lip movements and facial expressions that closely align with the spoken words [9]- [11]. This process creates a seamless lip-syncing effect, making it appear that the virtual character speaks the dialogue naturally and convincingly.…”
Section: Talking Headmentioning
confidence: 99%
See 1 more Smart Citation
“…Within the context of talking head generation, and video editing there are a number of recent works that have explored using diffusion models. Specifically, Stypułkowski et al (2023), Shen et al (2023), andBigioi et al (2023) being among the first to explore their use for endto-end talking head generation and audio driven video editing. All three methods follow a similar auto-regressive frame-based approach where the previously generated frame is fed back into the model along with the audio signal and a reference identity frame to generate the next frame in the sequence.…”
Section: Diffusion-based Generationmentioning
confidence: 99%