2019
DOI: 10.1145/3306346.3322967
|View full text |Cite
|
Sign up to set email alerts
|

Learning to optimize halide with tree search and random programs

Abstract: We present a new algorithm to automatically schedule Halide programs for high-performance image processing and deep learning. We significantly improve upon the performance of previous methods, which considered a limited subset of schedules. We define a parameterization of possible schedules much larger than prior methods and use a variant of beam search to search over it. The search optimizes runtime predicted by a cost model based on a combination of new derived features and machine learning. We train the cos… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
176
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 164 publications
(179 citation statements)
references
References 19 publications
(26 reference statements)
0
176
0
Order By: Relevance
“…We use a 3.4 GHz, quad-core Intel i5-4670 CPU with 16GB RAM and two GPUs (each experiment uses a single GPU): an NVIDIA GTX 1080Ti and an NVIDIA Tesla V100 (Table 1 lists their key specifications). For our benchmarks, we use six canonical image processing applications that have appeared in prior work [6,14,18,19,22]. Table 2 reports the number of stages and the size of the input image for each benchmark.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…We use a 3.4 GHz, quad-core Intel i5-4670 CPU with 16GB RAM and two GPUs (each experiment uses a single GPU): an NVIDIA GTX 1080Ti and an NVIDIA Tesla V100 (Table 1 lists their key specifications). For our benchmarks, we use six canonical image processing applications that have appeared in prior work [6,14,18,19,22]. Table 2 reports the number of stages and the size of the input image for each benchmark.…”
Section: Discussionmentioning
confidence: 99%
“…The final problem involves choosing tile and block sizes. We present an automatic fusion algorithm that considers key factors affecting the performance of GPU kernels which are not considered in previous work [6,17,18]: 1) number of global memory transactions, 2) achieved and theoretical occupancy, 3) GPU resource usage, and 4) fraction of overlapping computations.…”
Section: Dynamic Programming Fusionmentioning
confidence: 99%
See 1 more Smart Citation
“…Auto-Tuning approaches including Halide's auto-tuners [1,23], OpenTuner [2], ATF [24], and program synthesis techniques such as SwizzleInventor [20] aim to automatically develop optimized code using design space exploration. We aim to automatically synthesize Fireiron strategies in the future but in its current version it is designed as a tool for human performance experts.…”
Section: Related Workmentioning
confidence: 99%
“…Recent research shows growing interest in automatic whole-program optimization techniques [18][19][20], but approaches are preliminary and typically focus on optimizing only one aspect of a program at a time. There is no doubt that multi-dimensional whole program optimization is a hard task, but we can perhaps take some hope from the recent success of hybrid search/learning approaches such as AlphaGo [21] that show promise in finding good solutions within huge combinatorial search spaces.…”
Section: Manual Vs Automatic Search Strategiesmentioning
confidence: 99%