Wolf Lior scite author profile

We study the problem of syncing the lip movement in a video with the audio stream. Our solution finds an optimal alignment using a dual-domain recurrent neural network that is trained on synthetic data we generate by dropping and duplicating video frames. Once the alignment is found, we modify the video in order to sync the two sources. Our method is shown to greatly outperform the literature methods on a variety of existing and new benchmarks. As an application, we demonstrate our ability to robustly align text-to-speech generated audio with an existing video stream. Our code and samples are available at https://github.com/itsyoavshalev/End-to-End-Lip-Synchronization-with-a-Temporal-AutoEncoder.

show abstract

Detect the Unexpected: Novelty Detection in Large Astrophysical Surveys using Fisher Vectors

Rotman

Reis

Poznanski

et al. 2019

View full text Add to dashboard Cite

Semi-Supervised Monaural Singing Voice Separation With a Masking Network Trained on Synthetic Mixtures

Michael¹,

Benaim²,

Lior³

2018

Preprint

View full text Add to dashboard Cite

Unsupervised Microvascular Image Segmentation Using an Active Contours Mimicking Neural Network

Gur¹,

Lior²,

Golgher³

et al. 2019

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Wolf Lior

NAM: Non-Adversarial Unsupervised Domain Mapping

Deep Meta Functionals for Shape Representation

End to End Lip Synchronization with a Temporal AutoEncoder

Detect the Unexpected: Novelty Detection in Large Astrophysical Surveys using Fisher Vectors

Semi-Supervised Monaural Singing Voice Separation With a Masking Network Trained on Synthetic Mixtures

Unsupervised Microvascular Image Segmentation Using an Active Contours Mimicking Neural Network

Contact Info

Product

Resources

About