Compression strategies and space-conscious representations for deep neural networks

Marinò, Giosuè Cataldo; Ghidoli, Gregorio; Frasca, Marco; Malchiodi, Dario

doi:10.1109/icpr48806.2021.9412209

Cited by 7 publications

(13 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this subsection we summarize our previous results obtained when compressing only FC layers via pruning, CWS, and PWS methods, considering each layer separately, that is when each layer has its own k distinct weights [30]. They serve as a base reference for the different analyses presented here.…”

Section: Preliminary Results From Previous Studiesmentioning

confidence: 99%

“…Due to its independence on the underlying architecture and their low complexity, weight sharing quantization has found a large application: here, the weights are first partitioned into multiple categories, then within each category a representative value is selected and used to replace all weights in that category. Such methods mainly differ in the way they subdivide the network weights, e.g., by means of clustering techniques [29], statistical methods [30,31], uniform schemes [32], or by minimizing the distortion and the entropy of the coded source [33]. We will describe these methods in detail in Section 3.…”

Section: Weight Quantizationmentioning

confidence: 99%

“…Probabilistic WS (PWS) This approach is based on a weight sharing technique named Probabilistic Quantization, recently proposed in [30] and relying on a probabilistic transformation analogous to those mapping weights onto special binary or ternary values proposed in [53,54]. Given w min ¼ min W o ; w max ¼ max W o , PWS is based on the following probabilistic rationale 1 : suppose that each weight w o is the specification of a random variable W o with support W :¼ ½w min ; w max .…”

Section: Share Weightsmentioning

confidence: 99%

“…It is easy to show that the matricial estimator of W o will be unbiased regardless of the chosen partition. Here we follow [30] to set c i as c i ¼ vi k , with v q the q-quantile of the entries of W o , which induces the representatives to be scattered evenly over the support in case the elements of W o are uniformly distributed over W. The overall time complexity amount to Oðnm logðnmÞÞ, due to quantile computation.…”

Section: Share Weightsmentioning

confidence: 99%

“…Among the formats specifically proposed for CNNs, in the experimental section we will leverage those yielding the highest compression ratio (to our knowledge) for the specific case. In particular, for FC quantized layers, matrices are represented via the Huffman Address Map (HAM) [30,31], and via its extension for sparse matrices, sparse HAM (sHAM), when the sparsity in W is high enough [30]. Both methods construct the Huffman code of the k source symbols, and use the codewords to compress the matrix in a unique binary string.…”

Section: Storing the Shared Weightsmentioning

confidence: 99%

See 4 more Smart Citations