Reinforcement Learning and Adaptive Sampling for Optimized DNN Compilation

Ahn, Byung Hoon; Pilligundla, Prannoy; Esmaeilzadeh, Hadi

doi:10.48550/arxiv.1905.12799

Cited by 4 publications

(4 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…TensorComprehensions [79] uses genetic algorithm, AutoTVM [13], [14] uses simulated annealing and boosted tree, Reagen et. al, [59] uses Bayesian optimization, RELEASE [7] uses RL, ATLAS [84] uses black box optimizations, some compiler design [12], [50] use profile-guided optimization to perform target-independent front-end compiler optimizations on DNNs or linear algebra computations. Some recent works use RL on HW/SW co-exploration to explore both DNN and its mapping over hardware [6], [32], [44], [88].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning

Kao

Jeong

Krishna

2020

2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

View full text Add to dashboard Cite

DNN accelerators provide efficiency by leveraging reuse of activations/weights/outputs during the DNN computations to reduce data movement from DRAM to the chip. The reuse is captured by the accelerator's dataflow. While there has been significant prior work in exploring and comparing various dataflows, the strategy for assigning on-chip hardware resources (i.e., compute and memory) given a dataflow that can optimize for performance/energy while meeting platform constraints of area/power for DNN(s) of interest is still relatively unexplored. The design-space of choices for balancing compute and memory explodes combinatorially, as we show in this work (e.g., as large as O(10 72 ) choices for running MobileNet-V2), making it infeasible to do manual-tuning via exhaustive searches. It is also difficult to come up with a specific heuristic given that different DNNs and layer types exhibit different amounts of reuse.In this paper, we propose an autonomous strategy called Con-fuciuX to find optimized HW resource assignments for a given model and dataflow style. ConfuciuX leverages a reinforcement learning method, REINFORCE, to guide the search process, leveraging a detailed HW performance cost model within the training loop to estimate rewards. We also augment the RL approach with a genetic algorithm for further fine-tuning. Con-fuciuX demonstrates the highest sample-efficiency for training compared to other techniques such as Bayesian optimization, genetic algorithm, simulated annealing, and other RL methods. It converges to the optimized hardware configuration 4.7 to 24 times faster than alternate techniques.

show abstract

Section: Related Workmentioning

confidence: 99%

“…Recently RL has been demonstrated within compilers/mappers [7], [24], [46], [51] for tiling and mapping DNNs over accelerators. ConfuciuX focuses on leveraging RL for exploring the search space during accelerator design.…”

Section: Introductionmentioning

confidence: 99%

ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning

Kao

Jeong

Krishna

2020

2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

View full text Add to dashboard Cite

show abstract

“…To the best of our knowledge, this work is one of the first that learns the design space to generalize it. Most prior works leveraging ML for accelerator DSE [4], [7], [16], [17], [27] focus on performing the search faster. Learning the design space enables constant time prediction of the optima.…”

Section: Workload Dims Design Constraintsmentioning

confidence: 99%

AIRCHITECT: Learning Custom Architecture Design and Mapping Space

Samajdar,

Joseph,

Denton

et al. 2021

Preprint

View full text Add to dashboard Cite

Design space exploration is an important but costly step involved in the design/deployment of custom architectures to squeeze out maximum possible performance and energy efficiency. Conventionally, optimizations require iterative sampling of the design space using simulation or heuristic tools. In this paper we investigate the possibility of learning the optimization task using machine learning (ML) and hence using the learnt model to predict optimal parameters for the design and mapping space of custom architectures, bypassing any exploration step. We use three case studies involving the optimal array design, SRAM buffer sizing, mapping, and schedule determination for systolicarray-based custom architecture design and mapping space. We perform systematic design-aware and statistical analysis of the optimization space for our case studies and highlight the patterns in the design space. We formulate the architecture design and mapping as a ML problem that allows us to leverage existing ML models for training and inference. We design and train a custom network architecture called AIRCHITECT, which is capable of learning the architecture design space with as high as 94.3% test accuracy and predicting optimal configurations which achieve on average (GeoMean) of 99.9% the best possible performance on a test dataset with 10 5 GEMM workloads.

show abstract

“…It has also been used for designing memory systems, such as prefetching [52] and memory controller [53]. Additionally, it has been applied to DNN compilation and mapping optimization [54,55,56]. In this work, we use RL for co-exploration of data and computation mapping in NMP systems.…”

Section: Reinforcement Learning (Rl)mentioning

confidence: 99%

Continual Learning Approach for Improving the Data and Computation Mapping in Near-Memory Processing System

Majumder,

Huang,

Kim

et al. 2021

Preprint

View full text Add to dashboard Cite

The resurgence of near-memory processing (NMP) with the advent of big data has shifted the computation paradigm from processor-centric to memory-centric computing. To meet the bandwidth and capacity demands of memory-centric computing, 3D memory has been adopted to form a scalable memory-cube network. Along with NMP and memory system development, the mapping for placing data and guiding computation in the memory-cube network has become crucial in driving the performance improvement in NMP. However, it is very challenging to design a universal optimal mapping for all applications due to unique application behavior and intractable decision space.In this paper, we propose an artificially intelligent memory mapping scheme, AIMM, that optimizes data placement and resource utilization through page and computation remapping. Our proposed technique involves continuously evaluating and learning the impact of mapping decisions on system performance for any application. AIMM uses a neural network to achieve a near-optimal mapping during execution, trained using a reinforcement learning algorithm that is known to be effective for exploring a vast design space. We also provide a detailed AIMM hardware design that can be adopted as a plugin module for various NMP systems. Our experimental evaluation shows that AIMM improves the baseline NMP performance in single and multiple program scenario by up to 70% and 50%, respectively.

show abstract

Reinforcement Learning and Adaptive Sampling for Optimized DNN Compilation

Cited by 4 publications

References 33 publications

ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning

ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning

AIRCHITECT: Learning Custom Architecture Design and Mapping Space

Continual Learning Approach for Improving the Data and Computation Mapping in Near-Memory Processing System

Contact Info

Product

Resources

About