A Hardware Accelerator for Computing an Exact Dot Product

Koenig, John; Biancolin, David; Bachrach, Jonathan; Asanović, Krste

doi:10.1109/arith.2017.38

Cited by 17 publications

(7 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Google used Chisel for the design of their Edge TPU [1], and two RISC-V implementations have been proposed -Rocket Chip and BOOM -showing that the initiative can be integrated in both industrial and academic worlds. Works like [8] showed that Chisel can be used to explore different implementations of a circuit, here designed for BLAS (Basic Linear Algebra Subroutines) dot product acceleration.…”

Section: Introductionmentioning

confidence: 99%

Chisel Usecase: Designing General Matrix Multiply for FPGA

Ferres¹,

Müller²,

Rousseau³

2020

Applied Reconfigurable Computing. Architectures, Tools, and Applications

View full text Add to dashboard Cite

To ease developers work in an industry where FPGA usage is constantly growing, we propose an alternative methodology for architecture design. Targetting FPGA boards, we aim at comparing implementations on multiple criteria. We implement it as a tool flow based on Chisel, taking advantage of high level functionalities to ease circuit design, evolution and reutilization, improving designers productivity. We target a Xilinx VC709 board and propose a case study on General Matrix Multiply implementation using this flow, which demonstrates its usability with performances comparable to the state of the art, as well as the genericity one can benefit from when designing an applicationspecific accelerator. We show that we were able to generate, simulate and synthesize 80 different architectures in less than 24 hours, allowing differents trade-offs to be quickly and easily studied, from the most performant to the less costly, to easily comply with integration constraints.

show abstract

Section: Introductionmentioning

confidence: 99%

Chisel Usecase: Designing General Matrix Multiply for FPGA

Ferres¹,

Müller²,

Rousseau³

2020

Applied Reconfigurable Computing. Architectures, Tools, and Applications

View full text Add to dashboard Cite

show abstract

“…• Accumulate the full precision products into a fixed-point accumulator wide enough to eliminate all rounding errors. Kulisch suggested using a sign-magnitude representation for the exact product accumulator [27], but recent literature [28], [20], [19], [25] motivates the choice of fixed-point two's complement representation for the wbit wide datapath before feeding the products into a carry-save compression tree. This is similar to the quire required by the posit standard [5] for summing products.…”

Section: Matrix Multiply-accumulate Operationsmentioning

confidence: 99%

Exact Dot Product Accumulate Operators for 8-bit Floating-Point Deep Learning

Desrentes,

de Dinechin,

Le Maire

2023

2023 26th Euromicro Conference on Digital System Design (DSD)

View full text Add to dashboard Cite

Low bit-width floating-point formats appear as the main alternative to 8-bit integers for quantized deep learning applications. We propose an architecture for exact dot product accumulate operators and compare its implementation costs for different 8-bit floating-point formats: FP8 with five exponent bits and two fraction bits (E5M2), FP8 with four exponent bits and three fraction bits (E4M3), and Posit8 formats with different exponent sizes. The front-ends of these exact dot product accumulate operators take 8-bit multiplicands, expand their fullprecision products to fixed-point, and sum terms into wide accumulators. The back-ends of these operators round down the wide accumulators contents first to FP32 and then to one of the 8-bit floating-point formats. We synthesize the proposed 8-bit floating-point exact dot product accumulate operators targeting the TSMC 16FFC node and compare their area and power to a baseline of operators with FP16 and INT8 multiplicands.

show abstract

“…This is a special function that falls within the category of fused operations. The key observation is that, although the final result of the dot product fits a given number of bits, the intermediate term may requires many more, in order to not loose accuracy [27]. First versions of the IEEE standard for floats (such as the 754 of 1985) did not specify the number of bits to use for fused operations.…”

Section: B Specific Dnn Functions To Implement In Hardwarementioning

confidence: 99%

Exploiting Posit Arithmetic for Deep Neural Networks in Autonomous Driving Applications

Cococcioni

Ruffaldi²,

Saponara

2018

2018 International Conference of Electrical and Electronic Technologies for Automotive

View full text Add to dashboard Cite

This paper discusses the introduction of an integrated Posit Processing Unit (PPU) as an alternative to Floating-point Processing Unit (FPU) for Deep Neural Networks (DNNs) in automotive applications. Autonomous Driving tasks are increasingly depending on DNNs. For example, the detection of obstacles by means of object classification needs to be performed in real-time without involving remote computing. To speed up the inference phase of DNNs the CPUs on-board the vehicle should be equipped with co-processors, such as GPUs, which embed specific optimization for DNN tasks. In this work, we review an alternative arithmetic that could be used within the co-processor. We argue that a new representation for floating point numbers called Posit is particularly advantageous, allowing for a better trade-off between computation accuracy and implementation complexity. We conclude that implementing a PPU within the co-processor is a promising way to speed up the DNN inference phase.

show abstract

A Hardware Accelerator for Computing an Exact Dot Product

Cited by 17 publications

References 7 publications

Chisel Usecase: Designing General Matrix Multiply for FPGA

Chisel Usecase: Designing General Matrix Multiply for FPGA

Exact Dot Product Accumulate Operators for 8-bit Floating-Point Deep Learning

Exploiting Posit Arithmetic for Deep Neural Networks in Autonomous Driving Applications

Contact Info

Product

Resources

About