ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9054103
|View full text |Cite
|
Sign up to set email alerts
|

End-To-End Generation of Talking Faces from Noisy Speech

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 22 publications
(18 citation statements)
references
References 12 publications
0
18
0
Order By: Relevance
“…Therefore, a natural progression of this work will be to perform on-line experiments with noise-hardened versions of the synthesizer, such as that proposed by Eskimez et al (2020) . Further studies will also look at improving the synthesizer model through the implementation of targeted loss models, informed by the findings of the confusion matrix analysis presented here.…”
Section: Discussionmentioning
confidence: 99%
“…Therefore, a natural progression of this work will be to perform on-line experiments with noise-hardened versions of the synthesizer, such as that proposed by Eskimez et al (2020) . Further studies will also look at improving the synthesizer model through the implementation of targeted loss models, informed by the findings of the confusion matrix analysis presented here.…”
Section: Discussionmentioning
confidence: 99%
“…Figure 1 shows the system overview, which employs the generative adversarial network (GAN) framework. Our generator network architecture is built based on our previous work [21], with a modification to accept the emotion condition input. For discriminator networks, we use one discriminator to distinguish the emotions expressed in videos, and another discriminator to distinguish the real and generated video frames.…”
Section: Methodsmentioning
confidence: 99%
“…They further improved their methods with three discriminators [10] that focus on improving the realness of video frames, the continuity between generated frames, and the synchronization between audio and visual data. Eskimez et al [21] proposed an end-to-end talking face generation system that is robust to noisy speech input. The system contains a frame discriminator to improve image quality and a pair discriminator to improve lip-speech synchronization.…”
Section: A Emotional Talking Face Generationmentioning
confidence: 99%
See 2 more Smart Citations