A full-stack search technique for domain optimized deep learning accelerators

Zhang, Dan; Huda, Safeen; Songhori, Ebrahim M.; Prabhu, Kartik; Le, Quoc V.; Goldie, Anna; Mirhoseini, Azalia

doi:10.1145/3503222.3507767

Cited by 31 publications

(18 citation statements)

References 76 publications

(103 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, both of the works [23], [24] target 16-bit inference, while modern DNN accelerators are mainly using 8-bit precision [1]. Towards the optimization of DNN accelerators, the work in [25] presents a full-stack accelerator search technique which improves the performance per thermal design power ratio. The work in [26] transforms convolutional and fully-connected DNN layers to achieve higher performance in terms of FLOPs/sec.…”

Section: Related Workmentioning

confidence: 99%

Energy-Efficient DNN Inference on Approximate Accelerators Through Formal Property Exploration

Spantidi

Zervakis

Anagnostopoulos

et al. 2022

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

Deep Neural Networks (DNNs) are being heavily utilized in modern applications and are putting energy-constraint devices to the test. To bypass high energy consumption issues, approximate computing has been employed in DNN accelerators to balance out the accuracy-energy reduction trade-off. However, the approximation-induced accuracy loss can be very high and drastically degrade the performance of the DNN. Therefore, there is a need for a fine-grain mechanism that would assign specific DNN operations to approximation in order to maintain acceptable DNN accuracy, while also achieving low energy consumption. In this paper, we present an automated framework for weight-to-approximation mapping enabling formal property exploration for approximate DNN accelerators. At the MAC unit level, our experimental evaluation surpassed already energyefficient mappings by more than ×2 in terms of energy gains, while also supporting significantly more fine-grain control over the introduced approximation.

show abstract

Section: Related Workmentioning

confidence: 99%

Energy-Efficient DNN Inference on Approximate Accelerators Through Formal Property Exploration

Spantidi

Zervakis

Anagnostopoulos

et al. 2022

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

show abstract

“…Prior ADL-based Design Methods: Prior NPU design approaches are mostly ADL-based, as in they define an architecture template [2], [32]- [35] in an architecture description language (ADL) [36], and build a system stack around it [3]. 1 For a template, the architecture is fixed, i.e., what kind of computation and memory units are interconnected and how.…”

Section: A Npu Design Requirements and Challengesmentioning

confidence: 99%

“…An architectural template for an NPU specifies what kinds of computational and memory units can be interconnected and how. Various system stack tools for the NPU, such as cost models, simulators, and compilers, are developed manually by experts, limiting support to only the template architecture [1], [3], [4]. As workloads evolve or application requirements become stringent, novel architectural features need to be integrated and explored [5].…”

Section: Introductionmentioning

confidence: 99%

“…Moreover, because design space is limited to exploring hyperparameters of one architecture, efficient architectures from the broad space remain unexplored, leading to inefficient designs. Further, existing accelerator design explorations [3], [6]- [9] use blackbox or NPU-agnostic optimizations; without reasoning about the effectiveness of explored solutions, they require thousands of trials or days for the vast space.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Special Session: Towards an Agile Design Methodology for Efficient, Reliable, and Secure ML Systems

Dave,

Marchisio,

Hanif

et al. 2022

Preprint

View full text Add to dashboard Cite

The real-world use cases of Machine Learning (ML) have exploded over the past few years. However, the current computing infrastructure is insufficient to support all realworld applications and scenarios. Apart from high efficiency requirements, modern ML systems are expected to be highly reliable against hardware failures as well as secure against adversarial and IP stealing attacks. Privacy concerns are also becoming a first-order issue. This article summarizes the main challenges in agile development of efficient, reliable and secure ML systems, and then presents an outline of an agile design methodology to generate efficient, reliable and secure ML systems based on user-defined constraints and objectives.

show abstract

“…This enables efficient support of group or depth-wise convolution on top of the commonly used channel direction only. Many architectures do not support depth-wise convolution efficiently resulting in significant execution time increase [19], [20]. 2) This processor also has a transpose engine and a vector engine with N -dimension indexing to support tensor manipulations and the various vector operations required by deep learning models.…”

Section: Introductionmentioning

confidence: 99%

A 64-TOPS Energy-Efficient Tensor Accelerator in 14nm With Reconfigurable Fetch Network and Processing Fusion for Maximal Data Reuse

Lee¹,

Kim²,

Yeon³

et al. 2022

IEEE Open J. Solid-State Circuits Soc.

View full text Add to dashboard Cite

For energy-efficient accelerators in data centers that leverage advances in the performance and energy efficiency of recent algorithms, flexible architectures are critical to support state-of-the-art algorithms for various deep learning tasks. Due to the matrix multiplication units at the core of tensor operations, most recent programmable architectures lack flexibility for layers with diminished dimensions, especially for inferences where a large batch axis is rarely allowed. In addition, exploiting the data reuse inherent within tensor operations for computing a single matrix multiplication is challenging. In this work, an extension of a vector processor in 14 nm is proposed, which is customized to tensor operations. The flexible architecture enables a tensorized loop to support various data layouts and different shapes and sizes of tensor operations. It also exploits all possible data reuse, including input, weight, and output. Based on the tensorized loop, fetch and reduction networks, which unicast or multicast with the ordering of both input data and processing data, can be simplified using a circuit-switching-like network with configured topology and flow control for each tensor operation. Two processing elements can be fused to optimize latency for a large model or can operate individually for throughput. As a result, various state-of-the-art models can be processed efficiently with straightforward compiler optimization, and the highest energy efficiency of 13.4 Inferences/s/W on EfficientNetV2-S is demonstrated.

show abstract

A full-stack search technique for domain optimized deep learning accelerators

Cited by 31 publications

References 76 publications

Energy-Efficient DNN Inference on Approximate Accelerators Through Formal Property Exploration

Energy-Efficient DNN Inference on Approximate Accelerators Through Formal Property Exploration

Special Session: Towards an Agile Design Methodology for Efficient, Reliable, and Secure ML Systems

A 64-TOPS Energy-Efficient Tensor Accelerator in 14nm With Reconfigurable Fetch Network and Processing Fusion for Maximal Data Reuse

Contact Info

Product

Resources

About