2021
DOI: 10.1109/tpami.2019.2934852
|View full text |Cite
|
Sign up to set email alerts
|

Learning Energy-Based Spatial-Temporal Generative ConvNets for Dynamic Patterns

Abstract: Video sequences contain rich dynamic patterns, such as dynamic texture patterns that exhibit stationarity in the temporal domain, and action patterns that are non-stationary in either spatial or temporal domain. We show that an energy-based spatial-temporal generative ConvNet can be used to model and synthesize dynamic patterns. The model defines a probability distribution on the video sequence, and the log probability is defined by a spatial-temporal ConvNet that consists of multiple layers of spatial-tempora… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
33
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 36 publications
(33 citation statements)
references
References 61 publications
(93 reference statements)
0
33
0
Order By: Relevance
“…In such a case, new point clouds can be generated using Monte Carlo Markov Chain (MCMC) sampling. Such an architecture was used to generate images [21], videos [22], [23], [24], and 3D voxels [25], [26]. One of the most recent models, Generative PointNet (GPN), applies this approach to 3D point clouds [27].…”
Section: Related Workmentioning
confidence: 99%
“…In such a case, new point clouds can be generated using Monte Carlo Markov Chain (MCMC) sampling. Such an architecture was used to generate images [21], videos [22], [23], [24], and 3D voxels [25], [26]. One of the most recent models, Generative PointNet (GPN), applies this approach to 3D point clouds [27].…”
Section: Related Workmentioning
confidence: 99%
“…Classic synthesis methods rely on mathematical models such as Markov random fields [56] and auto-regressive moving average model [16] to capture underlying motion characteristics. More recently, deep learning techniques, in particular 3D CNNs and GAN-based training [19,55,58,62,63], have been adopted to achieve more realistic synthesis results. It should be noted that both dynamic texture synthesis and VFI require accurate modeling of spatio-temporal characteristics.…”
Section: Dynamic Texture Synthesismentioning
confidence: 99%
“…Including additional frames here allows better modeling of higher-order motions and also provides more information on longer-term spatiotemporal characteristics. Motivated by recent work in dynamic texture synthesis [58,63], where spatio-temporal filtering was found to be effective for generating coherent video textures, we integrate a 3D CNN for texture enhancement. This CNN architecture (shown in Figure 4) is a modified version of the network developed in [29], but with reduced layer widths.…”
Section: Texture Enhancement Networkmentioning
confidence: 99%
See 1 more Smart Citation
“…Energy-based models: Recent works have shown strong performance of data space EBMs [72,44] in modeling high-dimensional complex dependencies, such as images [97,95,18,11,19], videos [78,79], 3D shapes [75,76], and point clouds [73], and also demonstrated the effectiveness of latent space EBMs [47] in improving the model expressivity for text [48], image [47], and trajectory [49] generation. Our paper also learns a latent space EBM as the prior model but builds the EBM on top of a vision transformer generator for image-conditioned saliency map prediction.…”
Section: Vision Transformersmentioning
confidence: 99%