Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights

Dave, Shail; Baghdadi, Riyadh; Nowatzki, Tony; Avancha, Sasikanth; Shrivastava, Aviral; Li, Baoxin

doi:10.1109/jproc.2021.3098483

Cited by 52 publications

(27 citation statements)

References 185 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Here, we focus on a summary of important techniques implemented in hardware accelerators that have explicit support for sparse computations in deep learning. Dave et al [2020] provide a comprehensive and generic survey including more architectures, techniques, and technical details on this topic. Accelerator designs are based on the observation that typical workloads have 50-90% ephemeral activation sparsity and up to 99% weight sparsity.…”

Section: Speeding Up Sparse Modelsmentioning

confidence: 99%

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Hoefler¹,

Alistarh²,

Ben-Nun³

et al. 2021

Preprint

View full text Add to dashboard Cite

The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse networks generalize just as well, if not better than, the original dense networks. Sparsity can reduce the memory footprint of regular networks to fit mobile devices, as well as shorten training time for ever growing networks. In this paper, we survey prior work on sparsity in deep learning and provide an extensive tutorial of sparsification for both inference and training. We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice. Our work distills ideas from more than 300 research papers and provides guidance to practitioners who wish to utilize sparsity today, as well as to researchers whose goal is to push the frontier forward. We include the necessary background on mathematical methods in sparsification, describe phenomena such as early structure adaptation, the intricate relations between sparsity and the training process, and show techniques for achieving acceleration on real hardware. We also define a metric of pruned parameter efficiency that could serve as a baseline for comparison of different sparse networks. We close by speculating on how sparsity can improve future workloads and outline major open problems in the field.. The supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience -Albert Einstein, 1933 INTRODUCTIONDeep learning shows unparalleled promise for solving very complex real-world problems in areas such as computer vision, natural language processing, knowledge representation, recommendation systems, drug discovery, and many more. With this development, the field of machine learning is moving from traditional feature engineering to neural architecture engineering. However, still little is known about how to pick the right architecture to solve a specific task. Several methods such as translational equivariance in convolutional layers, recurrence, structured weight sharing, pooling, or locality are used to introduce strong inductive biases in the model design. Yet, the exact model size and capacity required for a task remain unknown and a common strategy is to train overparameterized models and compress them into smaller representations. Biological brains, especially the human brain, are hierarchical, sparse, and recurrent structures [Friston 2008] and one can draw some similarities with the inductive biases in today's artificial neural networks. Sparsity plays an important role in scaling biological brains-the more

show abstract

Section: Speeding Up Sparse Modelsmentioning

confidence: 99%

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Hoefler¹,

Alistarh²,

Ben-Nun³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Further, we need an agile design methodology because sustaining acceleration becomes challenging as ML workloads evolve. Besides, automatic and efficient construction of system stack is needed, as NPU architectures must adapt to new workloads by supporting specializations like sparsity or novel implementations such as mixed-precision computations [5].…”

Section: A Npu Design Requirements and Challengesmentioning

confidence: 99%

“…In fact, all three steps can be jointly explored, especially through an explainable DSE. Automating Comprehensive Mapping Space Formulation: Mapping space for an NPU encapsulates all schedules (aka iteration spaces in a polyhedral compiler [49], [50]) that are possible corresponding to various loop optimizations like tiling, ordering, and unrolling, when executing a nested loop on an NPU [4], [5], [37]. To develop a compiler for a customized NPU architecture, experts have previously formulated the mapping space manually [1], [4], [34] or relied on NPU-agnostic loop optimizations [39].…”

Section: B End-to-end Agile Design Workflowmentioning

confidence: 99%

“…Various system stack tools for the NPU, such as cost models, simulators, and compilers, are developed manually by experts, limiting support to only the template architecture [1], [3], [4]. As workloads evolve or application requirements become stringent, novel architectural features need to be integrated and explored [5]. But, tools from the prior system stack cannot be reused much without significant additional efforts for new architecture.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Special Session: Towards an Agile Design Methodology for Efficient, Reliable, and Secure ML Systems

Dave,

Marchisio,

Hanif

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

The real-world use cases of Machine Learning (ML) have exploded over the past few years. However, the current computing infrastructure is insufficient to support all realworld applications and scenarios. Apart from high efficiency requirements, modern ML systems are expected to be highly reliable against hardware failures as well as secure against adversarial and IP stealing attacks. Privacy concerns are also becoming a first-order issue. This article summarizes the main challenges in agile development of efficient, reliable and secure ML systems, and then presents an outline of an agile design methodology to generate efficient, reliable and secure ML systems based on user-defined constraints and objectives.

show abstract

“…The architecture employs a hybrid memory cube as the memory module for training the DNNs in data centers. A thorough review on accelerators is presented in [126].…”

Section: A Memory Systemsmentioning

confidence: 99%

AI/ML Algorithms and Applications in VLSI Design and Technology

Amuru¹,

Vudumula²,

K.³

et al. 2022

Preprint

View full text Add to dashboard Cite

An evident challenge ahead for the integrated circuit (IC) industry in the nanometer regime is the investigation and development of methods that can reduce the design complexity ensuing from growing process variations and curtail the turnaround time of chip manufacturing. Conventional methodologies employed for such tasks are largely manual; thus, time-consuming and resource-intensive. In contrast, the unique learning strategies of artificial intelligence (AI) provide numerous exciting automated approaches for handling complex and dataintensive tasks in very-large-scale integration (VLSI) design and testing. Employing AI and machine learning (ML) algorithms in VLSI design and manufacturing reduces the time and effort for understanding and processing the data within and across different abstraction levels via automated learning algorithms. It, in turn, improves the IC yield and reduces the manufacturing turnaround time. This paper thoroughly reviews the AI/ML automated approaches introduced in the past towards VLSI design and manufacturing. Moreover, we discuss the scope of AI/ML applications in the future at various abstraction levels to revolutionize the field of VLSI design, aiming for high-speed, highly intelligent, and efficient implementations.

show abstract

Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights

Cited by 52 publications

References 185 publications

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Special Session: Towards an Agile Design Methodology for Efficient, Reliable, and Secure ML Systems

AI/ML Algorithms and Applications in VLSI Design and Technology

Contact Info

Product

Resources

About