2021
DOI: 10.1109/tpds.2020.3030548
|View full text |Cite
|
Sign up to set email alerts
|

The Deep Learning Compiler: A Comprehensive Survey

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
43
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 122 publications
(52 citation statements)
references
References 33 publications
0
43
0
Order By: Relevance
“…The evaluated memory sizes were 512 KiB and 256 MiB. The last configuration does not require any tiling to take place, while the first is the smallest size which is supported by the implementing tiling methods for this network 26 .…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…The evaluated memory sizes were 512 KiB and 256 MiB. The last configuration does not require any tiling to take place, while the first is the smallest size which is supported by the implementing tiling methods for this network 26 .…”
Section: Discussionmentioning
confidence: 99%
“…It is currently not possible to connect the BYOC flow with the micro-TVM runtime that is also still under development. This prevents the usage of TVM on (heterogeneous) embedded devices for 26 For convolutional layers only a split along the output channel dimension was implemented, as the splitting along the rows and columns requires extensive effort to implement and validate all edge cases that can occur TinyML applications, however, it can already be utilized during the hardware development to evaluate the performance of prototypes with real-world test cases.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…This allows better utilization (faster execution, lower energy consumption) of the target hardware. A detailed survey of the work is presented in [9].…”
Section: Related Workmentioning
confidence: 99%
“…A compiler takes the DL models from DL frameworks (e.g., Tensorflow [1], Mxnet [5], Pytorch [22]) as input. It converts the model into multiple level of intermediate representations (IRs), and then automatically applies various performance optimizations regarding the model characteristics and underlying hardware in order to generate high-performant model codes [19]. Although different design philosophies have been adopted in different compilers, the fundamental procedures to generate efficient model codes are similar.…”
Section: Introductionmentioning
confidence: 99%