2020 IEEE Winter Conference on Applications of Computer Vision (WACV) 2020
DOI: 10.1109/wacv45572.2020.9093546
|View full text |Cite
|
Sign up to set email alerts
|

Filter Distillation for Network Compression

Abstract: Multiview Self-Supervised Learning (MSSL) is based on learning invariances with respect to a set of input transformations. However, invariance partially or totally removes transformation-related information from the representations, which might harm performance for specific downstream tasks that require such information. We propose 2D strUctured and approximately EquivarianT representations (coined DUET), which are 2d representations organized in a matrix structure, and equivariant with respect to transformati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
22
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 34 publications
(26 citation statements)
references
References 80 publications
0
22
0
Order By: Relevance
“…Static sparse training with random pruning samples masks within each layer in a random fashion based on pre-defined layer-wise sparsities. The most naive approach is pruning each layer uniformly with the same pruning ratio, i.e., uniform pruning (Mariet & Sra, 2016;He et al, 2017;Suau et al, 2019;Gale et al, 2019). Mocanu et al (2016) proposed a non-uniform and scale-free topology, showing better performance than the dense counterpart when applied to restricted Boltzmann machines (RBMs).…”
Section: Static Sparse Trainingmentioning
confidence: 99%
“…Static sparse training with random pruning samples masks within each layer in a random fashion based on pre-defined layer-wise sparsities. The most naive approach is pruning each layer uniformly with the same pruning ratio, i.e., uniform pruning (Mariet & Sra, 2016;He et al, 2017;Suau et al, 2019;Gale et al, 2019). Mocanu et al (2016) proposed a non-uniform and scale-free topology, showing better performance than the dense counterpart when applied to restricted Boltzmann machines (RBMs).…”
Section: Static Sparse Trainingmentioning
confidence: 99%
“…Filter decomposition approaches decompose network matrices into several bases for vector spaces to estimate the informative parameters of the DNNs with low-rank approximation/factorization, thus reducing computation cost of the network [25] such as SVD [5], CP decomposition [21], Tucker decomposition [19], and others, [18] suggests methods to approximate convolutional operations by representing the weight matrix as smaller bases set of 2D separable filters without changing the original number of filters. In [40], Principal Component Analysis (PCA) was applied on max-pooled and flattened feature maps, to compute the amount of information to be preserved in each layer among all layers, enabling integration with each other.…”
Section: Related Workmentioning
confidence: 99%
“…Structured pruning, as considered in this work, prunes parameter groups instead of individual weights, allowing speedups to be achieved without sparse computation (Li et al, 2016). Many empirical heuristics for structured pruning exist; e.g., pruning parameter groups with low 1 norm (Li et al, 2016;Liu et al, 2017), measuring the gradient-based sensitivity of parameter groups , preserving network output (He et al, 2017;Luo et al, 2017;Yu et al, 2017), and more (Suau et al, 2020;Chin et al, 2019;Huang and Wang, 2018;Molchanov et al, 2016). Pruning typically follows a three-step process of pre-training, pruning, and fine-tuning (Li et al, 2016;.…”
Section: Related Workmentioning
confidence: 99%