State-of-the-art dynamic compilers often use global approaches, like Linear Scan or Graph Coloring, for register allocation. These algorithms consider the complete compilation unit for allocation, which increases the complexity of the implementation (e.g., support for lifetime holes in Linear Scan) and potentially also affects compilation time. We propose a novel non-global algorithm, which splits a compilation unit into traces based on profiling feedback and subsequently performs register allocation within each trace individually. Traces reduce the problem size to a single linear code segment, which simplifies the problem a register allocator needs to solve. Additionally, we can apply different register allocation algorithms to each trace. We show that this non-global approach can achieve results competitive to global register allocation.We present an implementation of Trace Register Allocation based on the Graal VM and show an evaluation for common Java benchmarks. We demonstrate that performance of this non-global approach is within 3% (on AMD64) and 1% (on SPARC) of global Linear Scan register allocation.
Register allocation is an integral part of compilation, regardless of whether a compiler aims for fast compilation or optimal code quality. State-of-the-art dynamic compilers often use global register allocation approaches such as linear scan. Recent results suggest that non-global trace-based register allocation approaches can compete with global approaches in terms of allocation quality. Instead of processing the whole compilation unit (i.e., method) at once, a trace-based register allocator divides the problem into linear code segments, called traces.In this work, we present a register allocation framework that can exploit the additional exibility of traces to select di erent allocation strategies based on the characteristics of a trace. This provides us with ne-grained control over the trade-o between compile time and peak performance in a just-in-time compiler.Our framework features three allocation strategies: a linearscan-based approach that achieves good code quality, a single-pass bottom-up strategy that aims for short allocation times, and an allocator for trivial traces.To demonstrate the exibility of the framework, we select 8 allocation policies and show their impact on compile time and peak performance. This approach can reduce allocation time by 7%-43% at a peak performance penalty of about 1%-11% on average.For systems that do not focus on peak performance, our approach allows to adjust the time spent for register allocation, and therefore the overall compilation time, thus nding the optimal balance between compile time and peak performance according to an application's requirements.
This paper proposes the idea of Trace Register Allocation, a register allocation approach that is tailored for just-intime (JIT) compilation in the context of virtual machines with run-time feedback. The basic idea is to offload costly operations such as spilling and splitting to less frequently executed branches and to focus on efficient registers allocation for the hot parts of a program. This is done by performing register allocation on traces instead of on the program as a whole. We believe that the basic approach is compatible to Linear Scan, the predominant register allocation algorithm for just-in-time compilation, in both code quality and allocation time, while our design leads to a simpler and more extensible solution. This extensibility allows us to add further enhancements in order to optimize the allocation based on the run-time profile of the application and thus to outperform current Linear Scan implementations.
Register allocation is a mandatory task for almost every compiler and consumes a signi cant portion of compile time. In a just-in-time compiler, compile time is a particular issue because compilation happens during program execution and contributes to the overall application run time. Parallelization can help here. We developed a theoretical model for parallel register allocation and show that it can be used in practice without a negative impact on the quality of the allocation result. Doing so reduces compilation latency, i.e., the duration until the result of a compilation is available. Our analysis shows that parallelization can theoretically decrease allocation latency by almost 50%. We implemented an initial prototype which reduces the register allocation latency by 28% when using four threads, compared to the single-threaded allocation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.