Q-CapsNets: A Specialized Framework for Quantizing Capsule Networks

Marchisio, Alberto; Bussolino, Beatrice; Colucci, Alessio; Martina, Maurizio; Masera, Guido; Shafique, Muhammad

doi:10.1109/dac18072.2020.9218746

Cited by 14 publications

(14 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In particular, in [175], it is stated that the bitwidth used for the weights can decrease approaching the last layers of the NN, while the bitwidth of the activations remains more or less constant. Following these ideas, Q-CapsNets [176] analyzes the layer-wise quantization capabilities of weights and activations of CapsNets, with a cross-layer optimization of the bitwidth and a fine-grained tuning for the dynamic routing operations. Finding the optimal bitwidth for each layer of a DNN is a complex task.…”

Section: F Full Precision Vs Quantized Implementationsmentioning

confidence: 99%

Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead

et al. 2020

Self Cite

View full text Add to dashboard Cite

Currently, Machine Learning (ML) is becoming ubiquitous in everyday life. Deep Learning (DL) is already present in many applications ranging from computer vision for medicine to autonomous driving of modern cars as well as other sectors in security, healthcare, and finance. However, to achieve impressive performance, these algorithms employ very deep networks, requiring a significant computational power, both during the training and inference time. A single inference of a DL model may require billions of multiply-and-accumulated operations, making the DL extremely compute-and energy-hungry. In a scenario where several sophisticated algorithms need to be executed with limited energy and low latency, the need for cost-effective hardware platforms capable of implementing energy-efficient DL execution arises. This paper first introduces the key properties of two brain-inspired models like Deep Neural Network (DNN), and Spiking Neural Network (SNN), and then analyzes techniques to produce efficient and high-performance designs. This work summarizes and compares the works for four leading platforms for the execution of algorithms such as CPU, GPU, FPGA and ASIC describing the main solutions of the state-of-the-art, giving much prominence to the last two solutions since they offer greater design flexibility and bear the potential of high energy-efficiency, especially for the inference process. In addition to hardware solutions, this paper discusses some of the important security issues that these DNN and SNN models may have during their execution, and offers a comprehensive section on benchmarking, explaining how to assess the quality of different networks and hardware systems designed for them.

show abstract

Section: F Full Precision Vs Quantized Implementationsmentioning

confidence: 99%

Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead

et al. 2020

Self Cite

View full text Add to dashboard Cite

show abstract

“…The edge platforms typically have limited memory and power/energy budgets, hence small-sized DNN models with limited number of operations are desired for Edge AI applications. Model compression techniques such as pruning (i.e., structured [15] [16] or unstructured [17]- [19]) and quantization [19]- [22] are considered to be highly effective for reducing the memory footprint of the models as well as for reducing the number of computations required per inference. Structured pruning [15] can achieve about 4x weight memory compression, while class-blind unstructured pruning (i.e., PruNet [18]) achieves up to 190x memory compression.…”

Section: A Optimizations For Dnn Modelsmentioning

confidence: 99%

“…For instance, quantization in the Deep Compression [19] improves the compression rate by about 3x for the AlexNet and the VGG-16 models. The Q-CapsNets framework [22] shows that quantization is highly effective for complex DNNs such as CapsNets as well. It reduces the memory requirement of the CapsNet [14] by 6.2x with a negligible accuracy degradation of 0.15%.…”

Section: A Optimizations For Dnn Modelsmentioning

confidence: 99%

“…Therefore, redundant accesses for the same data to DRAM are required, which restricts the systems from achieving high performance and energy efficiency gains, as DRAM access latency and energy are significantly higher than other operations [32]. Toward this, previous works have proposed (1) model compression through pruning [15]- [19] and quantization [19] [20]- [22], and (2) data partitioning and scheduling schemes [33]- [36]. However, they do not study the impact of DRAM accesses which dominate the total system latency and energy, and do not minimize redundant accesses for overlapping data in convolutional operations.…”

Section: B Optimizations For Dnn Acceleratorsmentioning

confidence: 99%

“…DNN Model Creation with Secure Training: DNNs for Edge AI have to meet the design constraints (e.g., accuracy, memory, power, and energy). This can be achieved through two different ways, i.e., by employing (1) model compression through pruning [18] and quantization [22] of the pre-trained DNN model, and (2) multiobjective neural architecture search (NAS) similar to the APNAS [28] and NASCaps [29] frameworks. APNAS [28] searches for a model that has good accuracy and performance considering a systolic array-based DNN accelerator [30] through reinforcement learning.…”

Section: A Cross-layer Framework For Energy-efficient and Robust Edge Aimentioning

confidence: 99%

See 2 more Smart Citations

Towards Energy-Efficient and Secure Edge AI: A Cross-Layer Framework

Shafique,

Marchisio,

Putra

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

The security and privacy concerns along with the amount of data that is required to be processed on regular basis has pushed processing to the edge of the computing systems. Deploying advanced Neural Networks (NN), such as deep neural networks (DNNs) and spiking neural networks (SNNs), that offer state-of-the-art results on resourceconstrained edge devices is challenging due to the stringent memory and power/energy constraints. Moreover, these systems are required to maintain correct functionality under diverse security and reliability threats. This paper first discusses existing approaches to address energy efficiency, reliability, and security issues at different system layers, i.e., hardware (HW) and software (SW). Afterward, we discuss how to further improve the performance (latency) and the energy efficiency of Edge AI systems through HW/SW-level optimizations, such as pruning, quantization, and approximation. To address reliability threats (like permanent and transient faults), we highlight cost-effective mitigation techniques, like fault-aware training and mapping. Moreover, we briefly discuss effective detection and protection techniques to address security threats (like model and data corruption). Towards the end, we discuss how these techniques can be combined in an integrated cross-layer framework for realizing robust and energy-efficient Edge AI systems.

show abstract