Towards General-Purpose Neural Network Computing

Eldridge, Schuyler; Waterland, Amos; Seltzer, Margo; Appavoo, Jonathan; Joshi, Ajay

doi:10.1109/pact.2015.21

Cited by 17 publications

(4 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Accelerator design for neural networks has become a major line of computer architecture research in recent years. A handful of prior work explored the design space of neural network acceleration, which can be categorized into ASICs [15], [16], [18]- [22], [26], [27], [30], [34], [37], [38], [41], [42], FPGA implementations [17], [28], [35], [36], [43], using unconventional devices for acceleration [29], [33], [40], and dataflow optimizations [16], [23]- [25], [31], [32], [39]. Most of these studies have focused on accelerator design and optimization of merely one specific type of convolutional as the most computeintensive operation in deep convolutional neural networks.…”

Section: -Gmentioning

confidence: 99%

GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks

Yazdanbakhsh

Samadi

Kim³

et al. 2018

2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA)

View full text Add to dashboard Cite

Generative Adversarial Networks (GANs) are one of the most recent deep learning models that generate synthetic data from limited genuine datasets. GANs are on the frontier as further extension of deep learning into many domains (e.g., medicine, robotics, content synthesis) requires massive sets of labeled data that is generally either unavailable or prohibitively costly to collect. Although GANs are gaining prominence in various fields, there are no accelerators for these new models. In fact, GANs leverage a new operator, called transposed convolution, that exposes unique challenges for hardware acceleration. This operator first inserts zeros within the multidimensional input, then convolves a kernel over this expanded array to add information to the embedded zeros. Even though there is a convolution stage in this operator, the inserted zeros lead to underutilization of the compute resources when a conventional convolution accelerator is employed. We propose the GANAX architecture to alleviate the sources of inefficiency associated with the acceleration of GANs using conventional convolution accelerators, making the first GAN accelerator design possible. We propose a reorganization of the output computations to allocate compute rows with similar patterns of zeros to adjacent processing engines, which also avoids inconsequential multiply-adds on the zeros. This compulsory adjacency reclaims data reuse across these neighboring processing engines, which had otherwise diminished due to the inserted zeros. The reordering breaks the full SIMD execution model, which is prominent in convolution accelerators. Therefore, we propose a unified MIMD-SIMD design for GANAX that leverages repeated patterns in the computation to create distinct microprograms that execute concurrently in SIMD mode. The interleaving of MIMD and SIMD modes is performed at the granularity of single microprogrammed operation. To amortize the cost of MIMD execution, we propose a decoupling of data access from data processing in GANAX. This decoupling leads to a new design that breaks each processing engine to an access micro-engine and an execute micro-engine. The proposed architecture extends the concept of access-execute architectures to the finest granularity of computation for each individual operand. Evaluations with six GAN models shows, on average, 3.6× speedup and 3.1× energy savings over EYERISS without compromising the efficiency of conventional convolution accelerators. These benefits come with a mere ≈7.8% area increase. These results suggest that GANAX is an effective initial step that paves the way for accelerating the next generation of deep neural models.

show abstract

Section: -Gmentioning

confidence: 99%

GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks

Yazdanbakhsh

Samadi

Kim³

et al. 2018

2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA)

View full text Add to dashboard Cite

show abstract

“…The study of [88] proposes an approach of k-modular redundancy which is created by k-fold replicating of hidden neurons and dividing the weights to k compared with the voting mechanism of the conventional methods. The authors evaluated the method for three applications on the NN accelerator: Black-Scholes, RSA, and Sobel.…”

Section: ) Resilience Enhancement By Redundancymentioning

confidence: 99%

Dependable DNN Accelerator for Safety-Critical Systems: A Review on the Aging Perspective

Moghaddasi,

Gorgin,

Lee

2023

IEEE Access

View full text Add to dashboard Cite

Nowadays, artificial intelligence (AI) and deep learning (DL) progressively adapt to various spheres of our lives. These disciplines contain safety-critical applications such as autonomous driving with a high risk of human injury in the case of malfunction, requiring a high promise of dependability. Even the dependability becomes more crucial as shrinking CMOS technology feature size worsens the resilience concerns due to factors like aging. This paper addresses the overarching dependability issue of advanced deep neural networks (DNN) accelerators from the aging perspective. Especially, a comprehensive survey and taxonomy of techniques used to evaluate and mitigate aging effects are introduced. We cover different aging effects like permanent faults, timing errors, and lifetime issues. We review research by the layer-wise approach and categorize several resilience classes to bring out major features. The concluding part of this review highlights the questions answered and several future research directions. This study is expected to benefit researchers in different areas of DNN deployment, especially the dependability of this emergent paradigm.

show abstract

“…Research in this area traditionally has employed HLS tools as "compiler" intermediaries between high-level implementations and the actual hardware design [48], which suffer from unpredictable resource usage. Recently exploratory work has been done using Chisel as the high-level implementation language and the low-level description language [49]. While this still is in its initial stages, we have begun to explore and develop such a solution for the purposes of Bragg peak detection [50].…”

Section: Lightweight Ai Capability For Future Detector Systemsmentioning

confidence: 99%

A Hardware Co-design Workflow for Scientific Instruments at the Edge

Yoshii¹,

Sankaran²,

Strempfer³

et al. 2021

Preprint

View full text Add to dashboard Cite

As spatial and temporal resolutions of scientific instruments improve, the explosion in the volume of data produced is becoming a key challenge. It can be a critical bottleneck for integration between scientific instruments at the edge and high-performance computers/emerging accelerators. Placing data compression or reduction logic close to the data source is a possible approach to solve the bottleneck. However, the realization of such a solution requires the development of custom ASIC designs, which is still challenging in practice and tends to produce one-off implementations unusable beyond the initial intended scope. Therefore, as a feasibility study, we have been investigating a design workflow that allows us to explore algorithmically complex hardware designs and develop reusable hardware libraries for the needs of scientific instruments at the edge. Our vision is to cultivate our hardware development capability for streaming/dataflow hardware components that can be placed close to the data source to enable extreme dataintensive scientific experiments or environmental sensing. Furthermore, reducing data movement is essential to improving computing performance in general. Therefore, our co-design efforts on streaming hardware components can benefit computing applications other than scientific instruments. This vision paper discusses hardware specialization needs in scientific instruments and briefly reviews our progress leveraging the Chisel hardware description language and emerging open-source hardware ecosystems, including a few design examples.

show abstract

Towards General-Purpose Neural Network Computing

Cited by 17 publications

References 31 publications

GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks

GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks

Dependable DNN Accelerator for Safety-Critical Systems: A Review on the Aging Perspective

A Hardware Co-design Workflow for Scientific Instruments at the Edge

Contact Info

Product

Resources

About