2021 58th ACM/IEEE Design Automation Conference (DAC) 2021
DOI: 10.1109/dac18074.2021.9586216
|View full text |Cite
|
Sign up to set email alerts
|

Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 99 publications
(19 citation statements)
references
References 26 publications
0
18
0
Order By: Relevance
“…Approach (1) consists of solutions like VeriGOOD-ML [3], which maps ML models described in the ONNX format to three substantially different architecture templates for different types of neural networks through the PolyMath compiler. GEMMINI [5] provides a parametrized systolic array generator in Chisel that connects to a RISC-V core; the GEMMINI toolchain then offloads operations from specific layers of ONNX models to the systolic array. TVM's VTA architecture [11] is a specialized co-processor for matrix multiplication, generated through HLS for FPGA; the TVM high-level framework can compile machine learning models into a stream of instructions for VTA.…”
Section: Related Workmentioning
confidence: 99%
“…Approach (1) consists of solutions like VeriGOOD-ML [3], which maps ML models described in the ONNX format to three substantially different architecture templates for different types of neural networks through the PolyMath compiler. GEMMINI [5] provides a parametrized systolic array generator in Chisel that connects to a RISC-V core; the GEMMINI toolchain then offloads operations from specific layers of ONNX models to the systolic array. TVM's VTA architecture [11] is a specialized co-processor for matrix multiplication, generated through HLS for FPGA; the TVM high-level framework can compile machine learning models into a stream of instructions for VTA.…”
Section: Related Workmentioning
confidence: 99%
“…Energy spent on the entire benchmark model inference is calculated using the average power, performance in terms of number of cycles required to run the benchmark for each design and the frequency of the design. Figure 14 shows PPAE comparison of AI-PiM with the equivalent Gemmini accelerator (Genc et al (2021); Gonzalez and Hong (2020)) with 8 × 8 systolic array. Figure 14 shows that loosely coupled Gemmini systolic array accelerator takes 9.62 times the power, 18.34 times the area and 9.36 higher energy to offer just 3% performance improvement over AI-PiM for the ResNet-50 neural network model.…”
Section: Figure 12mentioning
confidence: 99%
“…VeriGOOD-ML [22] uses the PolyMath compiler [23] to map ML models in the ONNX format to three different architecture templates designed for different types of neural networks. GEMMINI [24] offloads operations from specific layers of ONNX models to a systolic array connected to a RISC-V core, after building the systolic array itself starting from a parametrized generator in Chisel. TVM's VTA architecture [25] is a configurable FPGA co-processor for matrix multiplication; the TVM high-level framework then compiles each ML model into instructions for VTA.…”
Section: Hardware Acceleration For Machine Learningmentioning
confidence: 99%