2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
DOI: 10.1109/cvpr.2017.430
|View full text |Cite
|
Sign up to set email alerts
|

Network Sketching: Exploiting Binary Structure in Deep CNNs

Abstract: Convolutional neural networks (CNNs) with deep architectures have substantially advanced the state-of-the-art in computer vision tasks. However, deep networks are typically resource-intensive and thus difficult to be deployed on mobile devices. Recently, CNNs with binary weights have shown compelling efficiency to the community, whereas the accuracy of such models is usually unsatisfactory in practice. In this paper, we introduce network sketching as a novel technique of pursuing binary-weight CNNs, targeting … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
101
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 80 publications
(102 citation statements)
references
References 14 publications
1
101
0
Order By: Relevance
“…BNNs [23,40] propose to constrain both weights and activations to binary values (i.e., +1 and -1), where the multiply-accumulations can be replaced by purely xnor(·) and popcount(·) operations. To make a trade-off between accuracy and complexity, [13,15,29,48] propose to recursively perform residual quantization and yield a series of binary tensors with decreasing magnitude scales. However, multiple binarizations are sequential process which cannot be paralleled.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…BNNs [23,40] propose to constrain both weights and activations to binary values (i.e., +1 and -1), where the multiply-accumulations can be replaced by purely xnor(·) and popcount(·) operations. To make a trade-off between accuracy and complexity, [13,15,29,48] propose to recursively perform residual quantization and yield a series of binary tensors with decreasing magnitude scales. However, multiple binarizations are sequential process which cannot be paralleled.…”
Section: Related Workmentioning
confidence: 99%
“…We explore the difference between layer-wise and group-wise design strategies in approach can be treated as a kind of tensor approximation which has similarities with multiple binarizations methods in [13,15,29,30,48] and the differences are described in Sec. 4.…”
Section: Layer-wise Vs Group-wise Binary Decompositionmentioning
confidence: 99%
“…They have demonstrated the power of BNNs in terms of speed, memory use and power consumption. But recent works such as [58,11,21,10] also reveal the strong accuracy degradation and mismatch issue during the training when BNNs are applied in complicated tasks such as ImageNet ( [12]) recognition, especially when the activation is binarized. Although some work like [43,50,13] have offered reasonable solutions to approximate fullprecision neural network, much more computation and tricks on hyperparameters are still needed to implement compared with BENN.…”
Section: Related Workmentioning
confidence: 99%
“…Compute activation a l based on binary kernel w l b and input a l−1 ; 14 end 15 Backward Pass: 16 Compute gradient ∂J ∂wt based on [50,28]; 17 Parameter Update: 18 Update w t to w t+1 with any update rules (e.g., SGD or ADAM) 19 end 20 Ensemble Update: 21 Pick the BNN when training converges; 22 Use either bagging or boosting algorithm to update weight u i of each training example i; 23 end 24 Return: K trained base classifiers for BENN;…”
mentioning
confidence: 99%
“…Wen et al [40] proposes a method as shown in (7), where s t := g t ∞ := max(abs(g t )) is a scaler parameter, ⊗ is the Hadamard product, abs(·) respectively returns the absolute value of each element. The method quantizes gradients to ternary values that can effectively improve clients-to-server communication in distributed learning.g Guo et al [41] propose greedy approximation, which instead tries to learn the quantization as shown in (8), where B i is a binary filter, α i are optimization parameters and input channels (c)× width (w)× height (h) is the size of the filter.…”
Section: Quantization Model Of Convolutional Neural Networkmentioning
confidence: 99%