CoNNa–Hardware accelerator for compressed convolutional neural networks

Struharik, Rastislav; Vukobratović, Bogdan; Erdeljan, Andrea; Rakanovic, Damjan

doi:10.1016/j.micpro.2020.102991

Cited by 14 publications

(9 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Zero-skipping techniques [34] can avoid the multiplication by zero resulting from pruning. Skipping zeros from weights and activations may result in reduced performance efficiency [109]. The large on-chip memory is required in zero-skipping techniques to exploit the parallel processing in hardware acceleration.…”

Section: (C) Structured Block Pruning (Pruning Based On the Lowest Av...mentioning

confidence: 99%

Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A Review

et al. 2023

View full text Add to dashboard Cite

Successful integration of deep neural networks (DNNs) or deep learning (DL) has resulted in breakthroughs in many areas. However, deploying these highly accurate models for data-driven, learned, automatic, and practical machine learning (ML) solutions to end-user applications remains challenging. DL algorithms are often computationally expensive, power-hungry, and require large memory to process complex and iterative operations of millions of parameters. Hence, training and inference of DL models are typically performed on high-performance computing (HPC) clusters in the cloud. Data transmission to the cloud results in high latency, round-trip delay, security and privacy concerns, and the inability of real-time decisions. Thus, processing on edge devices can significantly reduce cloud transmission cost. Edge devices are end devices closest to the user, such as mobile phones, cyber-physical systems (CPSs), wearables, the Internet of Things (IoT), embedded and autonomous systems, and

show abstract

Section: (C) Structured Block Pruning (Pruning Based On the Lowest Av...mentioning

confidence: 99%

Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A Review

et al. 2023

View full text Add to dashboard Cite

show abstract

“…The CNN accelerator will speed up both the backbone CNN network and the additional convolutional layers in the SSD Head. This paper used a modified version of the CoNNa CNN HW accelerator, proposed in [22] for this purpose. In contrast, the Puppis HW accelerator will accelerate the remaining calculating functions from the SSD Head: softmax, bounding box, non-maximum suppression, and top-K sorting.…”

Section: System For Hw Acceleration Of Complete Ssd Architecturementioning

confidence: 99%

Puppis: Hardware Accelerator of Single-Shot Multibox Detectors for Edge-Based Applications

Vrbaski,

Josic,

Vranjkovic

et al. 2023

Electronics

View full text Add to dashboard Cite

Object detection is a popular image-processing technique, widely used in numerous applications for detecting and locating objects in images or videos. While being one of the fastest algorithms for object detection, Single-shot Multibox Detection (SSD) networks are also computationally very demanding, which limits their usage in real-time edge applications. Even though the SSD post-processing algorithm is not the most-complex segment of the overall SSD object-detection network, it is still computationally demanding and can become a bottleneck with respect to processing latency and power consumption, especially in edge applications with limited resources. When using hardware accelerators to accelerate backbone CNN processing, the SSD post-processing step implemented in software can become the bottleneck for high-end applications where high frame rates are required, as this paper shows. To overcome this problem, we propose Puppis, an architecture for the hardware acceleration of the SSD post-processing algorithm. As the experiments showed, our solution led to an average SSD post-processing speedup of 33.34-times when compared with a software implementation. Furthermore, the execution of the complete SSD network was on average 36.45-times faster than the software implementation when the proposed Puppis SSD hardware accelerator was used together with some existing CNN accelerators.

show abstract

“…Some authors [140] overcame the on-chip memory limit by considering that the matrix was stored in a dense format, requiring that all weights, including zeros, be loaded. In [141], an architecture was proposed that could skip zeros in both weights and activations. However, the solution had reduced performance efficiency.…”

Section: Hardware-oriented Deep Neural Network Optimizationsmentioning

confidence: 99%

Moving Deep Learning to the Edge

et al. 2020

View full text Add to dashboard Cite

Deep learning is now present in a wide range of services and applications, replacing and complementing other machine learning algorithms. Performing training and inference of deep neural networks using the cloud computing model is not viable for applications where low latency is required. Furthermore, the rapid proliferation of the Internet of Things will generate a large volume of data to be processed, which will soon overload the capacity of cloud servers. One solution is to process the data at the edge devices themselves, in order to alleviate cloud server workloads and improve latency. However, edge devices are less powerful than cloud servers, and many are subject to energy constraints. Hence, new resource and energy-oriented deep learning models are required, as well as new computing platforms. This paper reviews the main research directions for edge computing deep learning algorithms.

show abstract

CoNNa–Hardware accelerator for compressed convolutional neural networks

Cited by 14 publications

References 17 publications

Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A Review

Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A Review

Puppis: Hardware Accelerator of Single-Shot Multibox Detectors for Edge-Based Applications

Moving Deep Learning to the Edge

Contact Info

Product

Resources

About