2016
DOI: 10.1007/978-3-319-46484-8_16
|View full text |Cite
|
Sign up to set email alerts
|

Learning Temporal Transformations from Time-Lapse Videos

Abstract: Abstract. Based on life-long observations of physical, chemical, and biologic phenomena in the natural world, humans can often easily picture in their minds what an object will look like in the future. But, what about computers? In this paper, we learn computational models of object transformations from time-lapse videos. In particular, we explore the use of generative models to create depictions of objects at future times. These models explore several different prediction tasks: generating a future state give… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
96
0
3

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 119 publications
(105 citation statements)
references
References 19 publications
0
96
0
3
Order By: Relevance
“…The notion of visual "states" has been explored from several angles. Given a collection of images [20] or time-lapse videos [60,27], methods can discover transformations that map between object states in order to create new images or visualize their relationships. Given video input, action recognition can be posed as learning the visual state transformation, e.g., how a person manipulates an object [12,2] or how activity preconditions map to postconditions [51].…”
Section: Related Workmentioning
confidence: 99%
“…The notion of visual "states" has been explored from several angles. Given a collection of images [20] or time-lapse videos [60,27], methods can discover transformations that map between object states in order to create new images or visualize their relationships. Given video input, action recognition can be posed as learning the visual state transformation, e.g., how a person manipulates an object [12,2] or how activity preconditions map to postconditions [51].…”
Section: Related Workmentioning
confidence: 99%
“…Mikolov et al [38] showcased the composition additive property of word vectors learned in an unsupervised way from language data; Kulkarni et al [39], Reed et al [40] suggested that additive transformation can be achieved via reconstruction or prediction task by learning from parallel paired image data. In the video domain, Wang et al [41] studied a transformation-aware representation for semantic human action classification; Zhou et al [42] investigated time-lapse video generation given additional class labels.…”
Section: Related Workmentioning
confidence: 99%
“…Other works introduce recurrent networks into video generation (e.g. [27,28]). In line with these works, our method separately models the motion and appearance as well, using the PSGAN and SCGAN respectively.…”
Section: Pose Sequencesmentioning
confidence: 99%