An Accelerator Design Using a MTCA Decomposition Algorithm for CNNs

Zhao, Yunping; Lu, Jianzhuang; Chen, Xiaowen

doi:10.3390/s20195558

Cited by 6 publications

(4 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the experiments, we assume the clock frequency is 1 GHz. In addition, we assume that the CNN accelerator is in the SIMD architecture [9][10][11][12][13][14][16][17][18]23,26,27] . In Cnvlutin [23], Cambricon-X [9] and Dual Indexing [26], the number of PEs is 16.…”

Section: Experiments Resultsmentioning

confidence: 99%

“…To exploit the parallelism in CNNs, many CNN accelerators [9][10][11][12][13][14][16][17][18]23,26,27] are designed based on the single-instruction-multiple-data (SIMD) architecture. Note that the core of convolution operation is multiplication and accumulation.…”

Section: Related Workmentioning

confidence: 99%

“…Most of the layers in a CNN are convolutional (CONV) layers, which consume a large portion of the overall execution time. To speed up the intensive computations of the CONV layers, a lot of customized hardware accelerators [9][10][11][12][13][14][15][16][17][18] have been proposed to deal with this problem.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Block-Based Compression and Corresponding Hardware Circuits for Sparse Activations

Weng

Huang

Kao

2021

Sensors

View full text Add to dashboard Cite

In a CNN (convolutional neural network) accelerator, to reduce memory traffic and power consumption, there is a need to exploit the sparsity of activation values. Therefore, some research efforts have been paid to skip ineffectual computations (i.e., multiplications by zero). Different from previous works, in this paper, we point out the similarity of activation values: (1) in the same layer of a CNN model, most feature maps are either highly dense or highly sparse; (2) in the same layer of a CNN model, feature maps in different channels are often similar. Based on the two observations, we propose a block-based compression approach, which utilizes both the sparsity and the similarity of activation values to further reduce the data volume. Moreover, we also design an encoder, a decoder and an indexing module to support the proposed approach. The encoder is used to translate output activations into the proposed block-based compression format, while both the decoder and the indexing module are used to align nonzero values for effectual computations. Compared with previous works, benchmark data consistently show that the proposed approach can greatly reduce both memory traffic and power consumption.

show abstract

Section: Experiments Resultsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Block-Based Compression and Corresponding Hardware Circuits for Sparse Activations

Weng

Huang

Kao

2021

Sensors

View full text Add to dashboard Cite

show abstract

“…Since AlexNet achieved outstanding achievements in the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC), a lot of research teams have been devoted to the development of convolutional neural networks (CNNs) with well-known research advances such as ZFNet, GoogleNet, VGG, ResNet, etc. Owing to the increasing demand for real-time applications, an efficient dedicated hardware computation unit (i.e., a CNN accelerator) is required to support the calculations [ 1 , 2 , 3 , 4 , 5 , 6 ] in the inference process. Moreover, for edge devices, low power is also an important concern [ 7 , 8 , 9 ].…”

Section: Introductionmentioning

confidence: 99%

Convolver Design and Convolve-Accumulate Unit Design for Low-Power Edge Computing

Kao

Chen

Huang

2021

Sensors

View full text Add to dashboard Cite

Convolution operations have a significant influence on the overall performance of a convolutional neural network, especially in edge-computing hardware design. In this paper, we propose a low-power signed convolver hardware architecture that is well suited for low-power edge computing. The basic idea of the proposed convolver design is to combine all multipliers’ final additions and their corresponding adder tree to form a partial product matrix (PPM) and then to use the reduction tree algorithm to reduce this PPM. As a result, compared with the state-of-the-art approach, our convolver design not only saves a lot of carry propagation adders but also saves one clock cycle per convolution operation. Moreover, the proposed convolver design can be adapted for different dataflows (including input stationary dataflow, weight stationary dataflow, and output stationary dataflow). According to dataflows, two types of convolve-accumulate units are proposed to perform the accumulation of convolution results. The results show that, compared with the state-of-the-art approach, the proposed convolver design can save 15.6% power consumption. Furthermore, compared with the state-of-the-art approach, on average, the proposed convolve-accumulate units can reduce 15.7% power consumption.

show abstract

SAC: An Ultra-Efficient Spin-based Architecture for Compressed DNNs

Zhao,

Ma,

Liu

et al. 2024

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

Deep Neural Networks (DNNs) have achieved great progress in academia and industry. But they have become computational and memory intensive with the increase of network depth. Previous designs seek breakthroughs in software and hardware levels to mitigate these challenges. At the software level, neural network compression techniques have effectively reduced network scale and energy consumption. However, the conventional compression algorithm is complex and energy intensive. At the hardware level, the improvements in the semiconductor process have effectively reduced power and energy consumption. However, it is difficult for the traditional Von-Neumann architecture to further reduce the power consumption, due to the memory wall and the end of Moore’s law. To overcome these challenges, the spintronic device based DNN machines have emerged for their non-volatility, ultra low power, and high energy efficiency. However, there is no spin-based design has achieved innovation at both the software and hardware level. Specifically, there is no systematic study of spin-based DNN architecture to deploy compressed networks. In our study, we present an ultra-efficient Spin-based Architecture for Compressed DNNs (SAC), to substantially reduce power consumption and energy consumption. Specifically, we propose a One-Step Compression algorithm (OSC) to reduce the computational complexity with minimum accuracy loss. We also propose a spin-based architecture to realize better performance for the compressed network. Furthermore, we introduce a novel computation flow that enables the reuse of activations and weights. Experimental results show that our study can reduce the computational complexity of compression algorithm from \(\mathcal {O}(Tk^3) \) to \(\mathcal {O}(k^2 \log k) \) , and achieve 14 × ∼ 40 × compression ratio. Furthermore, our design can attain a 2 × enhancement in power efficiency and a 5 × improvement in computational efficiency compared to the Eyeriss. Our models are available at an anonymous link https://bit.ly/39cdtTa.

show abstract

An Accelerator Design Using a MTCA Decomposition Algorithm for CNNs

Cited by 6 publications

References 23 publications

Block-Based Compression and Corresponding Hardware Circuits for Sparse Activations

Block-Based Compression and Corresponding Hardware Circuits for Sparse Activations

Convolver Design and Convolve-Accumulate Unit Design for Low-Power Edge Computing

SAC: An Ultra-Efficient Spin-based Architecture for Compressed DNNs

Contact Info

Product

Resources

About