iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis

Blattmann, Andreas; Milbich, Timo; Dorkenwald, Michael; Ommer, Björn

doi:10.1109/iccv48922.2021.01444

Cited by 13 publications

(15 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These methods [4,12,15,15,16,31] generate a video from a single still image automatically and thus do not allow the user interaction to control the animation. Different from the above set of works, [1,2,5,6,10,11] allow the users to interact and control the movement in the animation to varying degrees, and hence are more closely related to the current work. Dorkenwald et al [5] propose a oneto-one mapping between image and video using a residual representation, that allows the user to provide a single direction of motion for video generation.…”

Section: Related Workmentioning

confidence: 99%

“…Dorkenwald et al [5] propose a oneto-one mapping between image and video using a residual representation, that allows the user to provide a single direction of motion for video generation. [1] and [2] propose methods that govern the animation of different parts in the image with a single poke at a particular location defined by the start and end location of the motion. However, these methods [1,2,5] are unsuitable for our problem that necessitates the use of a sparse set of input directions and speeds at arbitrary locations.…”

Section: Related Workmentioning

confidence: 99%

“…[1] and [2] propose methods that govern the animation of different parts in the image with a single poke at a particular location defined by the start and end location of the motion. However, these methods [1,2,5] are unsuitable for our problem that necessitates the use of a sparse set of input directions and speeds at arbitrary locations. The closest approach to our work is [11].…”

Section: Related Workmentioning

confidence: 99%

“…There has been a rich body of work [1, 3, 5, 7, 10-12, 17, 21, 26] that has focused on generating animations from still images. While [3,7,12,17,21,26] focus on uncontrollable image-to-video synthesis, attempts [1,2,5,10,11] have been made for controllable image-to-video synthesis with the user-provided direction of the motion of the objects in the images. While these methods provide some control to the user, they suffer from certain drawbacks.…”

Section: Introductionmentioning

confidence: 99%

“…While these methods provide some control to the user, they suffer from certain drawbacks. Specifically, [1,2,5] either allow the user to poke at just a single pixel location or provide a single user direction. Halperin et.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Controllable Animation of Fluid Elements in Still Images

Mahapatra¹,

Kulkarni²

2021

Preprint

View full text Add to dashboard Cite

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Controllable Animation of Fluid Elements in Still Images

Mahapatra¹,

Kulkarni²

2021

Preprint

View full text Add to dashboard Cite

show abstract

Invertible Neural Networks for Understanding Semantics of Invariances of CNN Representations

Rombach

Esser

Blattmann

et al. 2022

Deep Neural Networks and Data for Automated Driving

View full text Add to dashboard Cite

To tackle increasingly complex tasks, it has become an essential ability of neural networks to learn abstract representations. These task-specific representations and, particularly, the invariances they capture turn neural networks into black-box models that lack interpretability. To open such a black box, it is, therefore, crucial to uncover the different semantic concepts a model has learned as well as those that it has learned to be invariant to. We present an approach based on invertible neural networks (INNs) that (i) recovers the task-specific, learned invariances by disentangling the remaining factor of variation in the data and that (ii) invertibly transforms these recovered invariances combined with the model representation into an equally expressive one with accessible semantic concepts. As a consequence, neural network representations become understandable by providing the means to (i) expose their semantic meaning, (ii) semantically modify a representation, and (iii) visualize individual learned semantic concepts and invariances. Our invertible approach significantly extends the abilities to understand black-box models by enabling post hoc interpretations of state-of-the-art networks without compromising their performance. Our implementation is available at https://compvis.github.io/invariances/.

show abstract