2021
DOI: 10.1109/jproc.2021.3098483
|View full text |Cite
|
Sign up to set email alerts
|

Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
22
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 52 publications
(27 citation statements)
references
References 185 publications
0
22
0
Order By: Relevance
“…Here, we focus on a summary of important techniques implemented in hardware accelerators that have explicit support for sparse computations in deep learning. Dave et al [2020] provide a comprehensive and generic survey including more architectures, techniques, and technical details on this topic. Accelerator designs are based on the observation that typical workloads have 50-90% ephemeral activation sparsity and up to 99% weight sparsity.…”
Section: Speeding Up Sparse Modelsmentioning
confidence: 99%
“…Here, we focus on a summary of important techniques implemented in hardware accelerators that have explicit support for sparse computations in deep learning. Dave et al [2020] provide a comprehensive and generic survey including more architectures, techniques, and technical details on this topic. Accelerator designs are based on the observation that typical workloads have 50-90% ephemeral activation sparsity and up to 99% weight sparsity.…”
Section: Speeding Up Sparse Modelsmentioning
confidence: 99%
“…Further, we need an agile design methodology because sustaining acceleration becomes challenging as ML workloads evolve. Besides, automatic and efficient construction of system stack is needed, as NPU architectures must adapt to new workloads by supporting specializations like sparsity or novel implementations such as mixed-precision computations [5].…”
Section: A Npu Design Requirements and Challengesmentioning
confidence: 99%
“…In fact, all three steps can be jointly explored, especially through an explainable DSE. Automating Comprehensive Mapping Space Formulation: Mapping space for an NPU encapsulates all schedules (aka iteration spaces in a polyhedral compiler [49], [50]) that are possible corresponding to various loop optimizations like tiling, ordering, and unrolling, when executing a nested loop on an NPU [4], [5], [37]. To develop a compiler for a customized NPU architecture, experts have previously formulated the mapping space manually [1], [4], [34] or relied on NPU-agnostic loop optimizations [39].…”
Section: B End-to-end Agile Design Workflowmentioning
confidence: 99%
See 1 more Smart Citation
“…The architecture employs a hybrid memory cube as the memory module for training the DNNs in data centers. A thorough review on accelerators is presented in [126].…”
Section: A Memory Systemsmentioning
confidence: 99%