2021
DOI: 10.1109/tcad.2020.3023903
|View full text |Cite
|
Sign up to set email alerts
|

OMNI: A Framework for Integrating Hardware and Software Optimizations for Sparse CNNs

Abstract: Convolution neural networks (CNNs) as one of today's main flavor of deep learning techniques dominate in various image recognition tasks. As the model size of modern CNNs continues to grow, neural network compression techniques have been proposed to prune the redundant neurons and synapses. However, prior techniques disconnect the software neural networks compression and hardware acceleration, which fail to balance multiple design parameters including sparsity, performance, hardware area cost, and efficiency. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
10

Relationship

3
7

Authors

Journals

citations
Cited by 27 publications
(15 citation statements)
references
References 45 publications
0
15
0
Order By: Relevance
“…Sparse tensor accelerators. [13,25,26,44,50,51,89,92] are sparse DNN accelerators. MAERI [41] uses tree-based interconnects for data distribution and reduction which is similar to our reconfigurable adder tree.…”
Section: Related Workmentioning
confidence: 99%
“…Sparse tensor accelerators. [13,25,26,44,50,51,89,92] are sparse DNN accelerators. MAERI [41] uses tree-based interconnects for data distribution and reduction which is similar to our reconfigurable adder tree.…”
Section: Related Workmentioning
confidence: 99%
“…For example, the latency of MRAM tends to be substantially larger than that of SRAM latency [58]. Moreover, the bandwidth of local memory also varies between memory blocks, depending on the number of banks allocated to those blocks (e.g., [19] and [59,60]). Thus, each PE tends to experience an order-of-magnitude difference in its latency and bandwidth, depending on which memory block the activations (or filters) are transferred from/to.…”
Section: Spatial Data Dependence Graphmentioning
confidence: 99%
“…Most of these works optimize their dataflows based on loop operations like loop interchange and loop unrolling [16][17][18][19][20]. The dense accelerator can result in high hardware inefficiency since most multiplication operations involve zero operands [5,6,16,[21][22][23][24][25]. Implementation of sparse DNNs has been studied in recent years on FPGAs [26].…”
Section: Introductionmentioning
confidence: 99%