2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) 2016
DOI: 10.1109/isvlsi.2016.117
|View full text |Cite
|
Sign up to set email alerts
|

Reducing the Model Order of Deep Neural Networks Using Information Theory

Abstract: Abstract-Deep neural networks are typically represented by a much larger number of parameters than shallow models, making them prohibitive for small footprint devices. Recent research shows that there is considerable redundancy in the parameter space of deep neural networks. In this paper, we propose a method to compress deep neural networks by using the Fisher Information metric, which we estimate through a stochastic optimization method that keeps track of secondorder information in the network. We first rem… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 28 publications
0
4
0
Order By: Relevance
“…Here we move beyond considering the intrinsic dimensionality of datasets to study the dimension of deep representations of these datasets that networks use to classify them. Following the autoencoder example, we posit that our results may provide a foundation for future work to determine the most efficient sizes of networks that learn classification tasks [25,40,41]. For instance, if the maximum dimensionality achieved by a network is 50 in a middle layer, we conjecture that this will inform the layer size of a deep neural network that solves the task with high performance, either via standard training procedures or those that add pruning or compression steps.…”
Section: Discussionmentioning
confidence: 94%
“…Here we move beyond considering the intrinsic dimensionality of datasets to study the dimension of deep representations of these datasets that networks use to classify them. Following the autoencoder example, we posit that our results may provide a foundation for future work to determine the most efficient sizes of networks that learn classification tasks [25,40,41]. For instance, if the maximum dimensionality achieved by a network is 50 in a middle layer, we conjecture that this will inform the layer size of a deep neural network that solves the task with high performance, either via standard training procedures or those that add pruning or compression steps.…”
Section: Discussionmentioning
confidence: 94%
“…The core mechanism of X-SNS lies in the derivation of sub-networks customized for individual languages and the computation of similarity between a pair of sub-networks. As a crucial component in the construction of our targeted sub-network, we introduce the Fisher information (Fisher, 1922), which provides a means of quantifying the amount of information contained in parameters within a neural network (Tu et al, 2016;Achille et al, 2019). Concretely, we derive the (empirical) Fisher information of a language model's parameters as follows.…”
Section: Proposed Method: X-snsmentioning
confidence: 99%
“…For the Laplace prior, we find that up to 30% of the weights in the fully-connected layers can be pruned without a significant drop in performance. However, following the work of [34], we also explore a pruning approach that uses the Fisher Information Matrix (FIM) of the weights. As also observed by [34], pruning the weights based on the Fisher information alone does not allow for a large number of parameters to be pruned effectively because many values in the FIM diagonal are close to zero; however, combining Fisher-based pruning and magnitude-based pruning allows for a larger number of weights to be pruned -up to 60% in this case.…”
Section: Uncertainty Calibrationmentioning
confidence: 99%