Gradient Forward-Propagation for Large-Scale Temporal Video Modelling

Malinowski, Mateusz; Vytiniotis, Dimitrios; Świrszcz, Grzegorz; Pătrăucean, Viorica; Carreira, João

doi:10.1109/cvpr46437.2021.00913

Cited by 2 publications

(1 citation statement)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Sparse Network [14] is applied to image recognition models but can only save the memory theoretically. Sideways [35,36] reduce the memory cost by overwriting activations whenever new ones become available but can only be applied to causal models. Regarding video specific methods, a popular paradigm for training temporal action detectors is to build the model upon pre-extracted features for temporal modeling and reasoning ("Freeze Backbone" in Fig.…”

Section: Related Workmentioning

confidence: 99%

Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models

Cao¹,

Xu²,

Xiong³

et al. 2022

Preprint

View full text Add to dashboard Cite

We propose a memory efficient method, named Stochastic Backpropagation (SBP), for training deep neural networks on videos. It is based on the finding that gradients from incomplete execution for backpropagation can still effectively train the models with minimal accuracy loss, which attributes to the high redundancy of video. SBP keeps all forward paths but randomly and independently removes the backward paths for each network layer in each training step. It reduces the GPU memory cost by eliminating the need to cache activation values corresponding to the dropped backward paths, whose amount can be controlled by an adjustable keep-ratio. Experiments show that SBP can be applied to a wide range of models for video tasks, leading to up to 80.0% GPU memory saving and 10% training speedup with less than 1% accuracy drop on action recognition and temporal action detection.

show abstract

Section: Related Workmentioning

confidence: 99%