An Approximate GEMM Unit for Energy-Efficient Object Detection

Pilipović, Ratko; Risojević, Vladimir; Božič, Janko; Bulić, Patricio; Lotrič, Uroš

doi:10.3390/s21124195

Cited by 7 publications

(4 citation statements)

References 91 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As future work, we plan to improve the algorithm to implement it on heterogeneous multicore CPU / GPU architectures, as done in [ 36 ], and to optimize the portable design of memory accesses to avoid unwanted overhead on Jetson boards with low CUDA capabilities. Moreover, thanks to the introduction of Volta architecture on the recent Nvidia Tegra series, the availability of tensor cores opens up to new algorithmic designs [ 37 ]. Those devices deliver half-precision GEMM (General Matrix Multiply) in one clock cycle, consuming low-energy in edge context.…”

Section: Discussionmentioning

confidence: 99%

A GPU-Parallel Image Coregistration Algorithm for InSar Processing at the Edge

Romano

Lapegna

2021

Sensors

View full text Add to dashboard Cite

Image Coregistration for InSAR processing is a time-consuming procedure that is usually processed in batch mode. With the availability of low-energy GPU accelerators, processing at the edge is now a promising perspective. Starting from the individuation of the most computationally intensive kernels from existing algorithms, we decomposed the cross-correlation problem from a multilevel point of view, intending to design and implement an efficient GPU-parallel algorithm for multiple settings, including the edge computing one. We analyzed the accuracy and performance of the proposed algorithm—also considering power efficiency—and its applicability to the identified settings. Results show that a significant speedup of InSAR processing is possible by exploiting GPU computing in different scenarios with no loss of accuracy, also enabling onboard processing using SoC hardware.

show abstract

Section: Discussionmentioning

confidence: 99%

A GPU-Parallel Image Coregistration Algorithm for InSar Processing at the Edge

Romano

Lapegna

2021

Sensors

View full text Add to dashboard Cite

show abstract

“…The core element of the ACTA represents a dedicated Approximate General matrix multiply hardware Unit (AGU) whose accuracy can be changed on the fly. The envisioned AGU is based on the iterative logarithmic product approximation proposed by Babić et al [13] and the design of an approximate GEMM unit presented by Pilipović et al [14]. The AGU does not duplicate the functionality in multiple accuracy versions but incorporates a simple logic to approximate addition and multiplication constituting the GEMM operation.…”

Section: Overview Of the Acurracy Tunable Accelerator (Acta) Platformmentioning

confidence: 99%

In Search of an Accuracy-Tuneable Accelerator Platform for Ubiquitous Computing

Pilipović

Pejović

Machidon

2023

GetMobile: Mobile Comp. and Comm.

View full text Add to dashboard Cite

Paving the way towards the realization of Mark Weiser's vision of ubiquitous computing [1], the research community has made incredible advancements on several fronts. When it comes to interacting with humans, for example, computers can already use pretty much anything as a touchpad [2]. Similarly, when it comes to sensing the environment, computers can unobtrusively detect anything from a driver fatigue [3] to the presence of the queen bee in a hive [4]. When compared with these, advancements on the "core" front - the computing itself - appear to be rather orthodox and limited.

show abstract

“…There have been several attempts to use approximate integer multipliers in neural network learning [12]- [14]. The authors of these studies report that the learning was successful, but they mainly worked with tiny neural networks.…”

Section: Introductionmentioning

confidence: 99%

Energy-efficient neural network learning with accuracy-adjustable floating-point multiplier

Pilipović¹,

Bulić²,

Lotrič³

2023

Preprint

View full text Add to dashboard Cite

<p>This paper proposes a novel approximate bfloat16 multiplier with on-the-fly adjustable accuracy for energy-efficient learning in deep neural networks. The size of the proposed multiplier is only 62% of the size of the exact bfloat16 multiplier. Furthermore, its energy footprint is up to five times smaller than the footprint of the exact bfloat multiplier. We demonstrate the advantages of the proposed multiplier in deep neural network learning, where we successfully train the ResNet-20 network on the CIFAR-10 dataset from scratch. </p>

show abstract

An Approximate GEMM Unit for Energy-Efficient Object Detection

Cited by 7 publications

References 91 publications

A GPU-Parallel Image Coregistration Algorithm for InSar Processing at the Edge

A GPU-Parallel Image Coregistration Algorithm for InSar Processing at the Edge

In Search of an Accuracy-Tuneable Accelerator Platform for Ubiquitous Computing

Energy-efficient neural network learning with accuracy-adjustable floating-point multiplier

Contact Info

Product

Resources

About