SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training

Qin, Eric; Samajdar, Ananda; Kwon, Hyoukjun; Nadella, Vineet; Srinivasan, Sudarshan; Das, Dipankar; Kaul, Bharat; Krishna, Tushar

doi:10.1109/hpca47549.2020.00015

Cited by 307 publications

(207 citation statements)

References 30 publications

Supporting

Mentioning

167

Contrasting

Order By: Relevance

“…There has been an incredible amount of interest in DNN hardware acceleration. Broadly speaking, the architecture community has focused on designing efficient dataflows to maximize local reuse of data and functional unit utilization [4,10,11,15,28,34,37,39], explore the space of possible dataflows and mappings [26,45,74], exploit model sparsity and data quantization [17,21,29,38,46,53,71,73,78], map DNN accelerators to FPGAs [20,66,69], and explore alternative compute, memory, and packaging technologies [35,58,59,67]. All of these works are highly relevant to this field.…”

Section: Related Workmentioning

confidence: 99%

Smaug

Yao

Bhardwaj

et al. 2020

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

In recent years, there has been tremendous advances in hardware acceleration of deep neural networks. However, most of the research has focused on optimizing accelerator microarchitecture for higher performance and energy efficiency on a per-layer basis. We find that for overall single-batch inference latency, the accelerator may only make up 25–40%, with the rest spent on data movement and in the deep learning software framework. Thus far, it has been very difficult to study end-to-end DNN performance during early stage design (before RTL is available), because there are no existing DNN frameworks that support end-to-end simulation with easy custom hardware accelerator integration. To address this gap in research infrastructure, we present SMAUG, the first DNN framework that is purpose-built for simulation of end-to-end deep learning applications. SMAUG offers researchers a wide range of capabilities for evaluating DNN workloads, from diverse network topologies to easy accelerator modeling and SoC integration. To demonstrate the power and value of SMAUG, we present case studies that show how we can optimize overall performance and energy efficiency for up to 1.8×–5× speedup over a baseline system, without changing any part of the accelerator microarchitecture, as well as show how SMAUG can tune an SoC for a camera-powered deep learning pipeline.

show abstract

Section: Related Workmentioning

confidence: 99%

Smaug

Yao

Bhardwaj

et al. 2020

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

show abstract

“…To address this challenge, a GEMM accelerator for DNN training called SIGMA [71] is proposed that can handle various irregular GEMMs dimension and different levels of sparsity, while maximizing the utilization of computing resources. The Flexible Dot Product Engine (Flex-DPE) maps GEMMs of various dimension and sparsity levels to PEs using scalable interconnects.…”

Section: Reconfigurable Interconnectsmentioning

confidence: 99%

An Overview of Efficient Interconnection Networks for Deep Neural Network Accelerators

Nabavinejad

Baharloo

Chen

et al. 2020

IEEE J. Emerg. Sel. Topics Circuits Syst.

View full text Add to dashboard Cite

Deep Neural Networks (DNNs) have shown significant advantages in many domains, such as pattern recognition, prediction, and control optimization. The edge computing demand in the Internet-of-Things (IoTs) era has motivated many kinds of computing platforms to accelerate DNN operations. However, due to the massive parallel processing, the performance of the current large-scale artificial neural network is often limited by the huge communication overheads and storage requirements. As a result, efficient interconnection and data movement mechanisms for future on-chip artificial intelligence (AI) accelerators are worthy of study. Currently, a large body of research aims to find an efficient on-chip interconnection to achieve low-power and high-bandwidth DNN computing. This paper provides a comprehensive investigation of the recent advances in efficient on-chip interconnection and design methodology of the DNN accelerator design. First, we provide an overview of the different interconnection methods on the DNN accelerator. Then, the interconnection methods on the non-ASIC DNN accelerator will be discussed. On the other hand, with the flexible interconnection, the DNN accelerator can support different computing flow, which increases the computing flexibility. With this motivation, reconfigurable DNN computing with flexible on-chip interconnection will be investigated in this paper. Finally, we investigate the emerging interconnection technologies (e.g., in/near-memory processing) for the DNN accelerator design. This paper systematically investigates the interconnection networks in modern DNN accelerator designs. With this article, the readers are able to: 1) understand

show abstract

“…al. [19] and the SIGMA [26] accelerator for sparse computation. A detail design of the hybrid PIM is beyond the scope of this paper.…”

Section: Architectural Considerationsmentioning

confidence: 99%

“…In particular, our approach shows significant improvement over baselines, namely, device-variation-aware training [21] and gradient-based protection [12], for hardware friendly DNN models like MobileNet-V2. We evaluate overheads of Hessian-driven parameter protection considering existing PIM and digital accelerators for sparse convolution [19,26]. The analysis shows negligible throughput overhead, but 7.5%,19.5%, and 4.9% reduction in power-efficiency compared to the baseline PIM design [26] for ResNet, MobileNet, and DenseNet, respectively.…”

Section: Introductionmentioning

confidence: 99%