TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

Chen, Tianqi; Moreau, Thierry; Jiang, Ziheng; Zheng, Lianmin; Yan, Eddie; Cowan, Meghan; Shen, Haichen; Wang, Leyuan; Hu, Yuwei; Ceze, Luís; Guestrin, Carlos; Krishnamurthy, Arvind

doi:10.48550/arxiv.1802.04799

Cited by 52 publications

(68 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…PyExZ3 [31], PySym [25], flake8 [13], and Frosted [65] analyze Python source code and employ multiple heuristics to identify code issues statically [27]. XLA [64] and TVM [10] apply compiler techniques to optimize deep learning applications. Harp [74] detects inefficiencies in Tensorflow and PyTorch applications based on computation graphs.…”

Section: Existing Tools Vs Pieprofmentioning

confidence: 99%

Toward efficient interactions between Python and native libraries

Tan

Chen

Liu

et al. 2021

Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Softw

View full text Add to dashboard Cite

Python has become a popular programming language because of its excellent programmability. Many modern software packages utilize Python for high-level algorithm design and depend on native libraries written in C/C++/Fortran for efficient computation kernels. Interaction between Python code and native libraries introduces performance losses because of the abstraction lying on the boundary of Python and native libraries. On the one side, Python code, typically run with interpretation, is disjoint from its execution behavior. On the other side, native libraries do not include program semantics to understand algorithm defects.To understand the interaction inefficiencies, we extensively study a large collection of Python software packages and categorize them according to the root causes of inefficiencies. We extract two inefficiency patterns that are common in interaction inefficiencies. Based on these patterns, we develop PieProf, a lightweight profiler, to pinpoint interaction inefficiencies in Python applications. The principle of PieProf is to measure the inefficiencies in the native execution and associate inefficiencies with high-level Python code to provide a holistic view. Guided by PieProf, we optimize 17 realworld applications, yielding speedups up to 6.3× on application level. CCS CONCEPTS• General and reference → Performance; Metrics; • Software and its engineering → Software maintenance tools.

show abstract

Section: Existing Tools Vs Pieprofmentioning

confidence: 99%

Toward efficient interactions between Python and native libraries

Tan

Chen

Liu

et al. 2021

Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Softw

View full text Add to dashboard Cite

show abstract

“…MNN relies on a semi-automated search technique to generate the kernels from a pre-defined number of optimization strategies [25]. TVM takes it a step further and performs compilation and autotuning for each kernel [2]. In SparseDNN, we adopt the last approach.…”

Section: Deep Learning Inference Enginesmentioning

confidence: 99%

“…In addition, the user could generate dense kernels from deep learning compiler frameworks such as TVM or Triton to use as plugins instead of oneDNN kernels if they provide better performance [2,36]. We do not explore this option in this paper.…”

Section: Optimized Dense Kernelsmentioning

confidence: 99%

SparseDNN: Fast Sparse Deep Learning Inference on CPUs

Wang¹

2021

Preprint

View full text Add to dashboard Cite

The last few years have seen gigantic leaps in algorithms and systems to support efficient deep learning inference. Pruning and quantization algorithms can now consistently compress neural networks by an order of magnitude. For a compressed neural network, a multitude of inference frameworks have been designed to maximize the performance of the target hardware. While we find mature support for quantized neural networks in production frameworks such as OpenVINO and MNN, support for pruned sparse neural networks is still lacking. To tackle this challenge, we present SparseDNN, a sparse deep learning inference engine targeting CPUs. We present both kernel-level optimizations with a sparse code generator to accelerate sparse operators and novel network-level optimizations catering to sparse networks. We show that our sparse code generator can achieve significant speedups over state-of-the-art sparse and dense libraries. On end-to-end benchmarks such as Huggingface pruneBERT, SparseDNN achieves up to 5x throughput improvement over dense inference with state-of-the-art OpenVINO.

show abstract

“…Low precision operators rely on efficient bitserial computation. We implement our operators using TVM, the deep learning compiler [3]. Our operators are designed to provide flexibility in precision and data layout, and performance portability across different CPU architectures.…”

Section: Low Precision Operatorsmentioning

confidence: 99%

Automating Generation of Low Precision Deep Learning Operators

Cowan,

Moreau,

Chen

et al. 2018

Preprint

Self Cite

View full text Add to dashboard Cite

State of the art deep learning models have made steady progress in the fields of computer vision and natural language processing, at the expense of growing model sizes and computational complexity. Deploying these models on low power and mobile devices poses a challenge due to their limited compute capabilities and strict energy budgets. One solution that has generated significant research interest is deploying highly quantized models that operate on low precision inputs and weights less than eight bits, trading off accuracy for performance. These models have a significantly reduced memory footprint (up to 32x reduction) and can replace multiply-accumulates with bitwise operations during compute intensive convolution and fully connected layers.Most deep learning frameworks rely on highly engineered linear algebra libraries such as ATLAS or Intel's MKL to implement efficient deep learning operators. To date, none of the popular deep learning directly support low precision operators, partly due to a lack of optimized low precision libraries. In this paper we introduce a work flow to quickly generate high performance low precision deep learning operators for arbitrary precision that target multiple CPU architectures and include optimizations such as memory tiling and vectorization. We present an extensive case study on low power ARM Cortex-A53 CPU, and show how we can generate 1-bit, 2-bit convolutions with speedups up to 16x over an optimized 16-bit integer baseline and 2.3x better than handwritten implementations.

show abstract

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

Cited by 52 publications

References 39 publications

Toward efficient interactions between Python and native libraries

Toward efficient interactions between Python and native libraries

SparseDNN: Fast Sparse Deep Learning Inference on CPUs

Automating Generation of Low Precision Deep Learning Operators

Contact Info

Product

Resources

About