Janna Escur scite author profile

Speech is a rich biometric signal that contains information about the identity, gender and emotional state of the speaker. In this work, we explore its potential to generate face images of a speaker by conditioning a Generative Adversarial Network (GAN) with raw speech input. We propose a deep neural network that is trained from scratch in an end-to-end fashion, generating a face directly from the raw speech waveform without any additional identity information (e.g reference image or one-hot encoding). Our model is trained in a self-supervised approach by exploiting the audio and visual signals naturally aligned in videos. With the purpose of training from video data, we present a novel dataset collected for this work, with highquality videos of youtubers with notable expressiveness in both the speech and visual signals.

show abstract

Multi-View 3D Face Reconstruction in the Wild Using Siamese Networks

Ramon¹,

Escur²,

Giró-i-Nieto

2019

View full text Add to dashboard Cite

In this work, we present a novel learning based approach to reconstruct 3D faces from a single or multiple images. Our method uses a simple yet powerful architecture based on siamese neural networks that helps to extract relevant features from each view while keeping the models small. Instead of minimizing multiple objectives, we propose to simultaneously learn the 3D shape and the individual camera poses by using a single term loss based on the reprojection error, which generalizes from one to multiple views. This allows to globally optimize the whole scene without having to tune any hyperparameters and to achieve low reprojection errors, which are important for further texture generation. Finally, we train our model on a large scale dataset with more than 6,000 facial scans. We report competitive results in 3DFAW 2019 challenge, showing the effectiveness of our method.

show abstract

H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction

Ramon¹,

Triginer²,

Escur³

et al. 2021

Preprint

View full text Add to dashboard Cite

io/h3d-net Figure 1. We introduce H3D-Net, a method for high-fidelity 3D head reconstruction in the wild. Our method estimates a signed distance function (SDF) of the head by optimizing a coordinate-based neural network on a small set of input images. This optimization process is constrained by a pre-trained probabilistic model of 3D head SDFs to obtain plausible shapes in few-shot setups. The figure shows the 3D head reconstruction of three scenes obtained with the proposed method from only three images with associated masks and camera poses.

show abstract

Wav2Pix: Speech-conditioned Face Generation using Generative Adversarial Networks

Duarte¹,

Roldán²,

Tubau³

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Janna Escur

H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction

Wav2Pix: Speech-conditioned Face Generation Using Generative Adversarial Networks

Multi-View 3D Face Reconstruction in the Wild Using Siamese Networks

H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction

Wav2Pix: Speech-conditioned Face Generation using Generative Adversarial Networks

Contact Info

Product

Resources

About