Timo Milbich scite author profile

Large intra-class variation is the result of changes in multiple object characteristics. Images, however, only show the superposition of different variable factors such as appearance or shape. Therefore, learning to disentangle and represent these different characteristics poses a great challenge, especially in the unsupervised case. Moreover, large object articulation calls for a flexible part-based model. We present an unsupervised approach for disentangling appearance and shape by learning parts consistently over all instances of a category. Our model for learning an object representation is trained by simultaneously exploiting invariance and equivariance constraints between synthetically transformed images. Since no part annotation or prior information on an object class is required, the approach is applicable to arbitrary classes. We evaluate our approach on a wide range of object categories and diverse tasks including pose prediction, disentangled image synthesis, and video-to-video translation. The approach outperforms the state-of-the-art on unsupervised keypoint prediction and compares favorably even against supervised approaches on the task of shape and appearance transfer.

show abstract

DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning

Milbich

Roth

Bharadhwaj

et al. 2020

View full text Add to dashboard Cite

PADS: Policy-Adapted Sampling for Visual Similarity Learning

Roth

Milbich

Ommer

2020

View full text Add to dashboard Cite

Unsupervised Video Understanding by Reconciliation of Posture Similarities

Milbich

Bautista

Sutter

et al. 2017

View full text Add to dashboard Cite

Understanding human activity and being able to explain it in detail surpasses mere action classification by far in both complexity and value. The challenge is thus to describe an activity on the basis of its most fundamental constituents, the individual postures and their distinctive transitions. Supervised learning of such a fine-grained representation based on elementary poses is very tedious and does not scale. Therefore, we propose a completely unsupervised deep learning procedure based solely on video sequences, which starts from scratch without requiring pre-trained networks, predefined body models, or keypoints. A combinatorial sequence matching algorithm proposes relations between frames from subsets of the training data, while a CNN is reconciling the transitivity conflicts of the different subsets to learn a single concerted pose embedding despite changes in appearance across sequences. Without any manual annotation, the model learns a structured representation of postures and their temporal development. The model not only enables retrieval of similar postures but also temporal super-resolution. Additionally, based on a recurrent formulation, next frames can be synthesized.

show abstract

Stochastic Image-to-Video Synthesis using cINNs

Dorkenwald

Milbich

Blattmann

et al. 2021

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Timo Milbich

Unsupervised Part-Based Disentangling of Object Shape and Appearance

DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning

PADS: Policy-Adapted Sampling for Visual Similarity Learning

Unsupervised Video Understanding by Reconciliation of Posture Similarities

Stochastic Image-to-Video Synthesis using cINNs

Contact Info

Product

Resources

About