Lele Chen scite author profile

We devise a cascade GAN approach to generate talking face video, which is robust to different face shapes, view angles, facial characteristics, and noisy audio conditions. Instead of learning a direct mapping from audio to video frames, we propose first to transfer audio to high-level structure, i.e., the facial landmarks, and then to generate video frames conditioned on the landmarks. Compared to a direct audio-to-image approach, our cascade approach avoids fitting spurious correlations between audiovisual signals that are irrelevant to the speech content. We, humans, are sensitive to temporal discontinuities and subtle artifacts in video. To avoid those pixel jittering problems and to enforce the network to focus on audiovisual-correlated regions, we propose a novel dynamically adjustable pixel-wise loss with an attention mechanism. Furthermore, to generate a sharper image with well-synchronized facial movements, we propose a novel regression-based discriminator structure, which considers sequence-level information along with frame-level information. Thoughtful experiments on several datasets and realworld samples demonstrate significantly better results obtained by our method than the state-of-the-art methods in both quantitative and qualitative comparisons.

show abstract

Demonstration of an ultrahigh-sensitivity atom-interferometry absolute gravimeter

Sun

et al. 2013

View full text Add to dashboard Cite

Deep Cross-Modal Audio-Visual Generation

et al. 2017

View full text Add to dashboard Cite

Cross-modal audio-visual perception has been a long-lasting topic in psychology and neurology, and various studies have discovered strong correlations in human perception of auditory and visual stimuli. Despite works in computational multimodal modeling, the problem of cross-modal audio-visual generation has not been systematically studied in the literature. In this paper, we make the first attempt to solve this cross-modal generation problem leveraging the power of deep generative adversarial training. Specifically, we use conditional generative adversarial networks to achieve cross-modal audio-visual generation of musical performances. We explore different encoding methods for audio and visual signals, and work on two scenarios: instrument-oriented generation and pose-oriented generation. Being the first to explore this new problem, we compose two new datasets with pairs of images and sounds of musical performances of different instruments. Our experiments using both classification and human evaluations demonstrate that our model has the ability to generate one modality, i.e., audio/visual, from the other modality, i.e., visual/audio, to a good extent. Our experiments on various design choices along with the datasets will facilitate future research in this new problem space.

show abstract

Lip Movements Generation at a Glance

et al. 2018

View full text Add to dashboard Cite

Cross-modality generation is an emerging topic that aims to synthesize data in one modality based on information in a different modality. In this paper, we consider a task of such: given an arbitrary audio speech and one lip image of arbitrary target identity, generate synthesized lip movements of the target identity saying the speech. To perform well in this task, it inevitably requires a model to not only consider the retention of target identity, photo-realistic of synthesized images, consistency and smoothness of lip images in a sequence, but more importantly, learn the correlations between audio speech and lip movements. To solve the collective problems, we explore the best modeling of the audio-visual correlations in building and training a lip-movement generator network. Specifically, we devise a method to fuse audio and image embeddings to generate multiple lip images at once and propose a novel correlation loss to synchronize lip changes and speech changes. Our final model utilizes a combination of four losses for a comprehensive consideration in generating lip movements; it is trained in an end-to-end fashion and is robust to lip shapes, view angles and different facial characteristics. Thoughtful experiments on three datasets ranging from lab-recorded to lips in-thewild show that our model significantly outperforms other state-of-the-art methods extended to this task. Layers Output Size Kernel Stride Padding

show abstract

Talking-Head Generation with Rhythmic Head Motion

Chen

Cui

Liu³

et al. 2020

View full text Add to dashboard Cite

Performance of a cold-atom gravimeter with an active vibration isolator

Zhou

Duan

et al. 2012

Phys. Rev. A

View full text Add to dashboard Cite

A cold rubidium atom fountain interferometry gravimeter with an active vibration isolator is demonstrated. The natural resonance frequency of the active vibration isolator is 0.016 Hz, and the vertical vibration noise is greatly reduced by a factor of 100 from 0.1 to 1 Hz. After substantial suppression of the vibration noise, the gravimeter reaches a sensitivity of 5.5 × 10 −8 g/Hz 1/2 . We measured the local gravitational acceleration g by this sensitive gravimeter with a resolution of 6.5 × 10 −9 g after 60 s and 1.5 × 10 −9 g after 2000 s integration time, which is comparable to the resolution of state-of-the-art atom gravimeters.

show abstract

Cascade two-stage tumor re-oxygenation and immune re-sensitization mediated by self-assembled albumin-sorafenib nanoparticles for enhanced photodynamic immunotherapy

Zhou

Chen

Liu

et al. 2022

Acta Pharmaceutica Sinica B

View full text Add to dashboard Cite

Observing the effect of wave-front aberrations in an atom interferometer by modulating the diameter of Raman beams

et al. 2016

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Lele Chen

Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss

Demonstration of an ultrahigh-sensitivity atom-interferometry absolute gravimeter

Deep Cross-Modal Audio-Visual Generation

Lip Movements Generation at a Glance

Talking-Head Generation with Rhythmic Head Motion

Performance of a cold-atom gravimeter with an active vibration isolator

Cascade two-stage tumor re-oxygenation and immune re-sensitization mediated by self-assembled albumin-sorafenib nanoparticles for enhanced photodynamic immunotherapy

Observing the effect of wave-front aberrations in an atom interferometer by modulating the diameter of Raman beams

Contact Info

Product

Resources

About