RTN: Reparameterized Ternary Network

Li, Yuhang; Dong, Xin; Zhang, Sai Qian; Bai, Haoli; Chen, Yuanpeng; Wang, Wei

doi:10.1609/aaai.v34i04.5912

Cited by 24 publications

(28 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The weights are ternarized to {+1, 0, -1} by comparing with trained or given thresholds as equation ( 4) shows, where w and w t are the original and ternarized weight values, and T H low and T H high are the thresholds. Modern TWNs train the weights to be ternary values [14], [15], [16], so the weights of target neural networks are already quantized into 2-bit numbers when the training finishes.…”

Section: A Overviewmentioning

confidence: 99%

“…BWNs quantize the weights of CNNs into {+1, -1} to replace the computation-intensive multiplication operations with addition and subtraction operations for high speedup, but this aggressive quantization also leads to lower accuracy. As both the accuracy and the speed of CNNs matter, other quantization methods, including 8-bit [10], [11] and 4-bit [12], [13] integer quantization (INT8 and INT4) and ternary quantization [14], [15], [16], are proposed to do a trade-off between the speed and accuracy as Table . I shows.…”

Section: Introductionmentioning

confidence: 99%

“…Ternary Weight Networks (TWNs) [14], [15] make an excellent trade-off between BWNs and 32-bit Full-Precision (FP) CNNs. TWNs quantize the weights of CNNs into {+1, 0, -1} and brings several advantages, as shown in Table . I.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

FAT: An In-Memory Accelerator with Fast Addition for Ternary Weight Neural Networks

Zhu,

Duong,

Chen

et al. 2022

Preprint

View full text Add to dashboard Cite

Convolutional Neural Networks (CNNs) demonstrate great performance in various applications but have high computational complexity. Quantization is applied to reduce the latency and storage cost of CNNs. Among the quantization methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique advantage over 8-bit and 4-bit quantization. They replace the multiplication operations in CNNs with additions, which are favoured on In-Memory-Computing (IMC) devices. IMC acceleration for BWNs has been widely studied. However, though TWNs have higher accuracy and better sparsity, IMC acceleration for TWNs has limited research. TWNs on existing IMC devices are inefficient because the sparsity is not well utilized, and the addition operation is not efficient.In this paper, we propose FAT as a novel IMC accelerator for TWNs. First, we propose a Sparse Addition Control Unit, which utilizes the sparsity of TWNs to skip the null operations on zero weights. Second, we propose a fast addition scheme based on the memory Sense Amplifier to avoid the time overhead of both carry propagation and writing back the carry to the memory cells. Third, we further propose a Combined-Stationary data mapping to reduce the data movement of both activations and weights and increase the parallelism of memory columns. Simulation results show that for addition operations at the Sense Amplifier level, FAT achieves 2.00× speedup, 1.22× power efficiency and 1.22× area efficiency compared with State-Of-The-Art IMC accelerator ParaPIM. FAT achieves 10.02× speedup and 12.19× energy efficiency compared with ParaPIM on networks with 80% sparsity.

show abstract

Section: A Overviewmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

FAT: An In-Memory Accelerator with Fast Addition for Ternary Weight Neural Networks

Zhu,

Duong,

Chen

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…XNOR-Net++ [27] 则将二值卷积过程中的权值与激活的尺度因子融合为一个可通过网络自适应学习的参数, 解决 XNOR 中由于固定尺度因子造成的激活值被限制在固定区间的问题. 与 XNOR-Net++ 方法类似, RTN [28] 网络提出了重新参数化量化数值的总体流程, 量化后的激活由一个尺度因子重新调整大小, 同时由一个偏移因子重新调整区间, 以实现更高的网络容量. 此外, 对于 1-bit 网络而言, 性能对于激活值的特征分布十分敏感.…”

Section: 基于前向过程改进的二值卷积神经网络unclassified

Binary neural networks for image super-resolution

姜馨蕊¹,

王楠楠²,

辛经纬³

et al. 2021

Sci. Sin.-Inf.

View full text Add to dashboard Cite

“…Deep neural networks (DNNs) have achieved remarkable success in a wide range of applications, however, they suffer from substantial computation and energy cost. In order to obtain light-weighted DNNs, network compression techniques have been widely developed in recent years, including network pruning (He, Zhang, and Sun 2017;Luo, Wu, and Lin 2017;Wen et al 2019), quantization (Han, Mao, and Dally 2016;Wu et al 2016;Li et al 2020) and knowledge distillation (Hinton, Vinyals, and Dean 2015;Romero et al 2014).…”

Section: Introductionmentioning

confidence: 99%

Few Shot Network Compression via Cross Distillation

Bai

King

et al. 2020

AAAI

Self Cite

View full text Add to dashboard Cite

Model compression has been widely adopted to obtain light-weighted deep neural networks. Most prevalent methods, however, require fine-tuning with sufficient training data to ensure accuracy, which could be challenged by privacy and security issues. As a compromise between privacy and performance, in this paper we investigate few shot network compression: given few samples per class, how can we effectively compress the network with negligible performance drop? The core challenge of few shot network compression lies in high estimation errors from the original network during inference, since the compressed network can easily over-fits on the few training instances. The estimation errors could propagate and accumulate layer-wisely and finally deteriorate the network output. To address the problem, we propose cross distillation, a novel layer-wise knowledge distillation approach. By interweaving hidden layers of teacher and student network, layer-wisely accumulated estimation errors can be effectively reduced. The proposed method offers a general framework compatible with prevalent network compression techniques such as pruning. Extensive experiments n benchmark datasets demonstrate that cross distillation can significantly improve the student network's accuracy when only a few training instances are available.

show abstract

RTN: Reparameterized Ternary Network

Cited by 24 publications

References 21 publications

FAT: An In-Memory Accelerator with Fast Addition for Ternary Weight Neural Networks

FAT: An In-Memory Accelerator with Fast Addition for Ternary Weight Neural Networks

Binary neural networks for image super-resolution

Few Shot Network Compression via Cross Distillation

Contact Info

Product

Resources

About