Adaptable VLSI Neural Network of Tens of Thousand Connections

Han, Il Song; Ahn, Ki Hwan; Park, Tae Ho; Jun, Ki Ho

doi:10.1016/b978-0-444-89488-5.50127-5

Cited by 3 publications

(2 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Pruning can be done either during training or after the model training. Other than pruning, eliminating the network redundancy without retraining [16], low rank approximation [17]- [21], fast Fourier transform (FFT) based convolutions [22], [23], quantization [24], binarization [25], [26], pruning [27], [28], sparsity regularization [29], [30], pruning low magnitude weights [31]- [33] are the common approaches. Knowledge distillation [34] refers to the process of training a smaller model to replicate the behavior of a larger, pre-trained model.…”

Section: Related Workmentioning

confidence: 99%

Compact optimized deep learning model for edge: a review

Naveen,

Kounte

2023

IJECE

View full text Add to dashboard Cite

<p>Most real-time computer vision applications, such as pedestrian detection, augmented reality, and virtual reality, heavily rely on convolutional neural networks (CNN) for real-time decision support. In addition, edge intelligence is becoming necessary for low-latency real-time applications to process the data at the source device. Therefore, processing massive amounts of data impact memory footprint, prediction time, and energy consumption, essential performance metrics in machine learning based internet of things (IoT) edge clusters. However, deploying deeper, dense, and hefty weighted CNN models on resource-constraint embedded systems and limited edge computing resources, such as memory, and battery constraints, poses significant challenges in developing the compact optimized model. Reducing the energy consumption in edge IoT networks is possible by reducing the computation and data transmission between IoT devices and gateway devices. Hence there is a high demand for making energy-efficient deep learning models for deploying on edge devices. Furthermore, recent studies show that smaller compressed models achieve significant performance compared to larger deep-learning models. This review article focuses on state-of-the-art techniques of edge intelligence, and we propose a new research framework for designing a compact optimized deep learning (DL) model deployment on edge devices.</p>

show abstract

Section: Related Workmentioning

confidence: 99%

Compact optimized deep learning model for edge: a review

Naveen,

Kounte

2023

IJECE

View full text Add to dashboard Cite

show abstract

“…Compared to the related works which often use a single threshold to prune parameters for the entire network [Han et al, 2015b, Zhao et al, 2019], AAP's layer-specific thresholds allow it to generate better pruned models, and these thresholds are also fully automatically tuned.…”

Section: Layer-aware Threshold Adjustmentmentioning

confidence: 99%

Automatic Attention Pruning: Improving and Automating Model Pruning using Attentions

Zhao¹,

Jain²,

Zhao³

2023

Preprint

View full text Add to dashboard Cite

Pruning is a promising approach to compress deep learning models in order to deploy them on resource-constrained edge devices. However, many existing pruning solutions are based on unstructured pruning, which yields models that cannot efficiently run on commodity hardware; and they often require users to manually explore and tune the pruning process, which is timeconsuming and often leads to sub-optimal results. To address these limitations, this paper presents Automatic Attention Pruning (AAP), an adaptive, attention-based, structured pruning approach to automatically generate small, accurate, and hardware-efficient models that meet user objectives. First, it proposes iterative structured pruning using activation-based attention maps to effectively identify and prune unimportant filters. Then, it proposes adaptive pruning policies for automatically meeting the pruning objectives of accuracy-critical, memory-constrained, and latency-sensitive tasks. A comprehensive evaluation shows that AAP substantially outperforms the state-of-the-art structured pruning works for a variety of model architectures. Our code is at: https://github.com/kaiqi123/ Automatic-Attention-Pruning.git.Second, we propose an adaptive pruning method that automatically optimizes the pruning process according to different user objectives. For latency-sensitive scenarios like interactive virtual assistants, we propose FLOPsguaranteed pruning to achieve the best accuracy with the acceptable inference speed; for memory-limited environ-

show abstract

Electronic implementation of artificial neural network with URAN

Han

Ahn

Park

Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan)

View full text Add to dashboard Cite

Adaptable VLSI Neural Network of Tens of Thousand Connections

Cited by 3 publications

References 2 publications

Compact optimized deep learning model for edge: a review

Compact optimized deep learning model for edge: a review

Automatic Attention Pruning: Improving and Automating Model Pruning using Attentions

Electronic implementation of artificial neural network with URAN

Contact Info

Product

Resources

About