Training Compact Deep Learning Models for Video Classification Using Circulant Matrices

Araújo, Alexandre; Négrevergne, Benjamin; Chevaleyre, Yann; Atif, Jamal

doi:10.1007/978-3-030-11018-5_25

Cited by 8 publications

(3 citation statements)

References 21 publications

(46 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…An alternative to pruning is to enforce sparse or structured matrices a-priori. By shifting and reusing the same row, implementing circulant matrix structures saves weights [1]. Alternatively, the frequency domain can help us impose sparse diagonal patterns onto the network weight matrices.…”

Section: B Network Compressionmentioning

confidence: 99%

Canonical convolutional neural networks

Lokesh

Klein

Garcke

2022

2022 International Joint Conference on Neural Networks (IJCNN)

View full text Add to dashboard Cite

We introduce canonical weight normalization for convolutional neural networks. Inspired by the canonical tensor decomposition, we express the weight tensors in so-called canonical networks as scaled sums of outer vector products. In particular, we train network weights in the decomposed form, where scale weights are optimized separately for each mode. Additionally, similarly to weight normalization, we include a global scaling parameter. We study the initialization of the canonical form by running the power method and by drawing randomly from Gaussian or uniform distributions. Our results indicate that we can replace the power method with cheaper initializations drawn from standard distributions. The canonical re-parametrization leads to competitive normalization performance on the MNIST, CIFAR10, and SVHN data sets. Moreover, the formulation simplifies network compression. Once training has converged, the canonical form allows convenient model-compression by truncating the parameter sums.

show abstract

Section: B Network Compressionmentioning

confidence: 99%

Canonical convolutional neural networks

Lokesh

Klein

Garcke

2022

2022 International Joint Conference on Neural Networks (IJCNN)

View full text Add to dashboard Cite

show abstract

“…Motivated by these applications, extensive studies have been recently conducted for designing compact architectures [44,27,28,64,2] or compressing models [13,57,9,35]. However, most of the existing methods process all the frames in a given video at the same resolution.…”

Section: Introductionmentioning

confidence: 99%

AR-Net: Adaptive Frame Resolution for Efficient Action Recognition

Yue

Lin

Panda

et al. 2020

Preprint

View full text Add to dashboard Cite

Action recognition is an open and challenging problem in computer vision. While current state-of-the-art models offer excellent recognition results, their computational expense limits their impact for many real-world applications. In this paper, we propose a novel approach, called AR-Net (Adaptive Resolution Network), that selects on-the-fly the optimal resolution for each frame conditioned on the input for efficient action recognition in long untrimmed videos. Specifically, given a video frame, a policy network is used to decide what input resolution should be used for processing by the action recognition model, with the goal of improving both accuracy and efficiency. We efficiently train the policy network jointly with the recognition model using standard backpropagation. Extensive experiments on several challenging action recognition benchmark datasets well demonstrate the efficacy of our proposed approach over state-of-the-art methods. The project page can be found at https://mengyuest.github.io/AR-Net

show abstract

“…Particularly, due to its scale insensitivity, average pooling allows ResNet models pre-trained on one input size to be effectively evaluated/applied on other input sizes with favourable results [12]. In addition, average pooling is known to be more robust than max pooling against outliers and noise [195]. It is also a conceptually simple method that has been empirically verified to outperform max pooling consistently over a range of CNN network architectures [196].…”

Section: Downsampling Operationmentioning

confidence: 99%

Deep learning for visual recognition at pixel, object, and image levels

Kuen¹

View full text Add to dashboard Cite

show abstract

Training Compact Deep Learning Models for Video Classification Using Circulant Matrices

Cited by 8 publications

References 21 publications

Canonical convolutional neural networks

Canonical convolutional neural networks

AR-Net: Adaptive Frame Resolution for Efficient Action Recognition

Deep learning for visual recognition at pixel, object, and image levels

Contact Info

Product

Resources

About