User-Directed Loop-Transformations in Clang

Kruse, Michael; Finkel, Hal

doi:10.1109/llvm-hpc.2018.8639402

Cited by 13 publications

(20 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, we illustrate two acceleration showcases for a detailed methodology discussion. As future work, we keep investigating advanced performance modeling and compiler optimizations [24] to provide better visual optimization guidance along with an increasingly higher degree of automatic optimizations. Moreover, we aim at incrementally relax methodology constraints such as required loop bounds and target frequency.…”

Section: Discussionmentioning

confidence: 99%

“…Free of dependencies, the optimal design performs a tiled computation unrolling the internal loop by a factor of 96 and then pipelining it, cyclic partitioning the local memory accordingly. Although with the advances in compiler technologies [24] it will be possible to widen the optimization space and require less manual intervention, the current degree of limitations potentially preventing optimal performance of state-of-the-art DSE engines makes this mixed optimization approach essential. Anyways, the final design has an estimated performance of 1.51 × 10 10 / (red triangle on blue dotted line in Figure 3) carrying a latency estimation error of only 0.000298% with respect the results provided by Vivado HLS.…”

Section: N-body Simulation Test Casementioning

confidence: 99%

See 1 more Smart Citation

A CAD-based methodology to optimize HLS code via the roofline model

Siracusa

Tucci²,

Rabozzi³

et al. 2020

Proceedings of the 39th International Conference on Computer-Aided Design

View full text Add to dashboard Cite

Section: Discussionmentioning

confidence: 99%

Section: N-body Simulation Test Casementioning

confidence: 99%

A CAD-based methodology to optimize HLS code via the roofline model

Siracusa

Tucci²,

Rabozzi³

et al. 2020

Proceedings of the 39th International Conference on Computer-Aided Design

View full text Add to dashboard Cite

“…We implemented a simple demonstration 1 in Python and slightly extended our implementation 2 of loop transformation directives from [3].…”

Section: Methodsmentioning

confidence: 99%

“…We have been working on improved loop transformations for Clang/LLVM [3]. In addition to the loop unrolling, unrolland-jam, vectorization, and loop distribution pragmas already supported by Clang, we added tiling, loop interchange, reversal, array packing, and thread-parallelization directives.…”

Section: Motivationmentioning

confidence: 99%

Autotuning Search Space for Loop Transformations

Kruse

Finkel

2020

Preprint

Self Cite

View full text Add to dashboard Cite

One of the challenges for optimizing compilers is to predict whether applying an optimization will improve its execution speed. Programmers may override the compiler's profitability heuristic using optimization directives such as pragmas in the source code. Machine learning in the form of autotuning can assist users in finding the best optimizations for each platform.In this paper we propose a loop transformation search space that takes the form of a tree, in contrast to previous approaches that usually use vector spaces to represent loop optimization configurations. We implemented a simple autotuner exploring the search space and applied it to a selected set of PolyBench kernels. While the autotuner is capable of representing every possible sequence of loop transformations and their relations, the results motivate the use of better search strategies such as Monte Carlo tree search to find sophisticated loop transformations such as multilevel tiling.

show abstract

“…For loop optimizations, we have implemented several new features in LLVM and Clang, and have describe these enhancements in papers ( [65,66] and in several forums directly to the LLVM community (including talks at the LLVM developers' meetings, on the LLVM mailing lists)).…”

Section: Recent Progress For Parallelism We Have Implemented Severalmentioning

confidence: 99%

ECP Software Technology Capability Assessment Report

Heroux

Computing²,

Carter

et al. 2018

View full text Add to dashboard Cite

The Exascale Computing Project (ECP) Software Technology (ST) Focus Area is responsible for developing critical software capabilities that will enable successful execution of ECP applications, and for providing key components of a productive and sustainable Exascale computing ecosystem that will position the US Department of Energy (DOE) and the broader high performance (HPC) community with a firm foundation for future extreme-scale computing capabilities. This ECP ST Capability Assessment Report (CAR) provides an overview and assessment of current ECP ST capabilities and activities, giving stakeholders and the broader HPC community information that can be used to assess ECP ST progress and plan their own efforts accordingly. ECP ST leaders commit to updating this document on regular basis (every six to 12 months). Highlights from this version of the report are presented here. What is new in CAR V2.0: CAR V2.0 contains the following updates relative to CAR V1.5. • We introduce the FY20-23 project structure. ECP ST now consists of 6 (up from 5) level-3 (L3) technical areas, introducing the NNSA ST L3 area, which brings into one L3 the ECP open source development efforts at NNSA labs that are of particular importance to the rest of ECP. The number of ECP ST level-4 (L4) subprojects has been reduced from 55 to 33. The strategic aggregation of projects into fewer and larger units enables us to better manage L4 subprojects consistently as a portfolio. See Section 1.2. • We describe new and enhanced project management processes and resources including our iterative planning process, new KPP-3 capability integration process, a product dictionary and dependency management database. These new project features and related dashboards enable more insight and better information to effectively manage efforts across ECP. See Section 2. • The two-page summaries of each ECP L4 projects have been updated to reflect recent progress and next steps. See Section 4. • The Extreme-scale Scientific Software Stack (E4S) is further described. The third release, which is also the first major public release Version 1.0, was November 18, 2019. E4S is the primary integration and delivery vehicle for ECP ST capabilities. See Section 2.1.1. • The ECP ST SDK effort has further refined its groupings. See Section 2.1.2. The Exascale Computing Project Software Technology (ECP ST) focus area represents the key bridge between Exascale systems and the scientists developing applications that will run on those platforms. ECP ST efforts contribute to 70 software products (Section 2.1.3) in six technical areas (Table 1). Since the publishing of CAR V1.5, we have introduced a product dictionary of official product names, which enables more rigorous mapping of ECP ST deliverables to stakeholders (Section 2.1.4). Programming Models & Runtimes: In addition to developing key enhancements to MPI and OpenMP for scalable systems with accelerated node architectures, we are working on performance portability layers (Kokkos and RAJA) and participating in OpenMP and O...

show abstract

User-Directed Loop-Transformations in Clang

Cited by 13 publications

References 21 publications

A CAD-based methodology to optimize HLS code via the roofline model

A CAD-based methodology to optimize HLS code via the roofline model

Autotuning Search Space for Loop Transformations

ECP Software Technology Capability Assessment Report

Contact Info

Product

Resources

About