Ruben Villegas scite author profile

Time Target Character 2 Input Motion Target Character 1 Figure 1: Our end-to-end method retargets a given input motion (top row), to new characters with different bone lengths and proportions, (middle and bottom row). The target characters are never seen performing the input motion during training. AbstractWe propose a recurrent neural network architecture with a Forward Kinematics layer and cycle consistency based adversarial training objective for unsupervised motion retargetting. Our network captures the high-level properties of an input motion by the forward kinematics layer, and adapts them to a target character with different skeleton bone lengths (e.g., shorter, longer arms etc.). Collecting paired motion training sequences from different characters is expensive. Instead, our network utilizes cycle consistency to learn to solve the Inverse Kinematics problem in an unsupervised manner. Our method works online, i.e., it adapts the motion sequence on-the-fly as new frames are received.In our experiments, we use the Mixamo animation data 1 to test our method for a variety of motions and characters and achieve state-of-the-art results. We also demonstrate motion retargetting from monocular human videos to 3D characters using an off-the-shelf 3D pose estimator.

show abstract

MT-VAE: Learning Motion Transformations to Generate Multimodal Human Dynamics

Yan

Rastogi

Villegas

et al. 2018

100

View full text Add to dashboard Cite

Long-term human motion can be represented as a series of motion modes-motion sequences that capture short-term temporal dynamics-with transitions between them. We leverage this structure and present a novel Motion Transformation Variational Auto-Encoders (MT-VAE) for learning motion sequence generation. Our model jointly learns a feature embedding for motion modes (that the motion sequence can be reconstructed from) and a feature transformation that represents the transition of one motion mode to the next motion mode. Our model is able to generate multiple diverse and plausible motion sequences in the future from the same input. We apply our approach to both facial and full body motion, and demonstrate applications like analogy-based motion transfer and video synthesis. * Work partially done during internship with Adobe Research.

show abstract

Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction

et al. 2015

View full text Add to dashboard Cite

Object detection systems based on the deep convolutional neural network (CNN) have recently made groundbreaking advances on several object detection benchmarks. While the features learned by these high-capacity neural networks are discriminative for categorization, inaccurate localization is still a major source of error for detection. Building upon high-capacity CNN architectures, we address the localization problem by 1) using a search algorithm based on Bayesian optimization that sequentially proposes candidate regions for an object bounding box, and 2) training the CNN with a structured loss that explicitly penalizes the localization inaccuracy. In experiments, we demonstrate that each of the proposed methods improves the detection performance over the baseline method on PASCAL VOC 2007 and 2012 datasets. Furthermore, two methods are complementary and significantly outperform the previous state-of-the-art when combined.

show abstract

Contact and Human Dynamics from Monocular Video

Rempe

Guibas

Hertzmann

et al. 2020

View full text Add to dashboard Cite

Who Do I Look Like? Determining Parent-Offspring Resemblance via Gated Autoencoders

Dehghan

Ortiz

Villegas

et al. 2014

View full text Add to dashboard Cite

Recent years have seen a major push for face recognition technology due to the large expansion of image sharing on social networks. In this paper, we consider the difficult task of determining parent-offspring resemblance using deep learning to answer the question "Who do I look like?" Although humans can perform this job at a rate higher than chance, it is not clear how they do it [2]. However, recent studies in anthropology [24] have determined which features tend to be the most discriminative. In this study, we aim to not only create an accurate system for resemblance detection, but bridge the gap between studies in anthropology with computer vision techniques. Further, we aim to answer two key questions: 1) Do offspring resemble their parents? and 2) Do offspring resemble one parent more than the other? We propose an algorithm that fuses the features and metrics discovered via gated autoencoders with a discriminative neural network layer that learns the optimal, or what we call genetic, features to delineate parent-offspring relationships. We further analyze the correlation between our automatically detected features and those found in anthropological studies. Meanwhile, our method outperforms the state-of-the-art in kinship verification by 3-10% depending on the relationship using specific (father-son, motherdaughter, etc.) and generic models.

show abstract

Stochastic Scene-Aware Motion Prediction

Hassan

Ceylan

Villegas

et al. 2021

View full text Add to dashboard Cite

Learning to Generate Long-term Future via Hierarchical Prediction

Villegas¹,

Yang²,

Zou³

et al. 2017

Preprint

View full text Add to dashboard Cite

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ruben Villegas

Learning Latent Dynamics for Planning from Pixels

Neural Kinematic Networks for Unsupervised Motion Retargetting

MT-VAE: Learning Motion Transformations to Generate Multimodal Human Dynamics

Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction

Contact and Human Dynamics from Monocular Video

Who Do I Look Like? Determining Parent-Offspring Resemblance via Gated Autoencoders

Stochastic Scene-Aware Motion Prediction

Learning to Generate Long-term Future via Hierarchical Prediction

Contact Info

Product

Resources

About