Dynamic register promotion of stack variables

Li, Jianjun; Wu, Chengyou; Hsu, Wei-Chung

doi:10.1109/cgo.2011.5764671

Cited by 5 publications

(4 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Ubiquitous memory introspection [33] detects frequently-stalling loads and adds prefetch instructions. [19] translates x86 binaries to x86-64, using the additional registers to promote stack variables. We perform much higher-level optimizations on our lifted stencils.…”

Section: Related Workmentioning

confidence: 99%

Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide DSL code

Mendis

Bosboom

et al. 2015

Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation

View full text Add to dashboard Cite

Highly optimized programs are prone to bit rot, where performance quickly becomes suboptimal in the face of new hardware and compiler techniques. In this paper we show how to automatically lift performance-critical stencil kernels from a stripped x86 binary and generate the corresponding code in the high-level domain-specific language Halide. Using Halide's state-of-the-art optimizations targeting current hardware, we show that new optimized versions of these kernels can replace the originals to rejuvenate the application for newer hardware.The original optimized code for kernels in stripped binaries is nearly impossible to analyze statically. Instead, we rely on dynamic traces to regenerate the kernels. We perform buffer structure reconstruction to identify input, intermediate and output buffer shapes. We abstract from a forest of concrete dependency trees which contain absolute memory addresses to symbolic trees suitable for high-level code generation. This is done by canonicalizing trees, clustering them based on structure, inferring higher-dimensional buffer accesses and finally by solving a set of linear equations based on buffer accesses to lift them up to simple, high-level expressions.Helium can handle highly optimized, complex stencil kernels with input-dependent conditionals. We lift seven kernels from Adobe Photoshop giving a 75% performance improvement, four kernels from IrfanView, leading to 4.97× performance, and one stencil from the miniGMG multigrid benchmark netting a 4.25× improvement in performance. We manually rejuvenated Photoshop by replacing eleven of Photoshop's filters with our lifted implementations, giving 1.12× speedup without affecting the user experience.

show abstract

Section: Related Workmentioning

confidence: 99%

Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide DSL code

Mendis

Bosboom

et al. 2015

Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation

View full text Add to dashboard Cite

show abstract

“…Jianjun et al [28] promote stack variables to registers dynamically, relying on hardware mechanism for memory disambiguation. In contrast, we provide theoretical formulations for symbol promotion without any hardware support.…”

Section: Related Workmentioning

confidence: 99%

A compiler-level intermediate representation based binary analysis and rewriting system

Anand

Smithson

ElWazeer

et al. 2013

Proceedings of the 8th ACM European Conference on Computer Systems

View full text Add to dashboard Cite

This paper presents component techniques essential for converting executables to a high-level intermediate representation (IR) of an existing compiler. The compiler IR is then employed for three distinct applications: binary rewriting using the compiler's binary back-end, vulnerability detection using source-level symbolic execution, and source-code recovery using the compiler's C backend. Our techniques enable complex high-level transformations not possible in existing binary systems, address a major challenge of inputderived memory addresses in symbolic execution and are the first to enable recovery of a fully functional source-code.We present techniques to segment the flat address space in an executable containing undifferentiated blocks of memory. We demonstrate the inadequacy of existing variable identification methods for their promotion to symbols and present our methods for symbol promotion. We also present methods to convert the physically addressed stack in an executable (with a stack pointer) to an abstract stack (without a stack pointer). Our methods do not use symbolic, relocation, or debug information since these are usually absent in deployed executables.We have integrated our techniques with a prototype x86 binary framework called SecondWrite that uses LLVM as IR. The robustness of the framework is demonstrated by handling executables totaling more than a million lines of source-code, produced by two different compilers (gcc and Microsoft Visual Studio compiler), three languages (C, C++, and Fortran), two operating systems (Windows and Linux) and a real world program (Apache server).

show abstract

“…However, they did not explain the details of their algorithm. Li et al [11] used a similar technique to detect accesses to aliased stack slots. In fact, pointer barrierization will not work correctly on objects in the heap if internal pointers and atomic instructions are ignored.…”

Section: Related Workmentioning

confidence: 99%

Continuous object access profiling and optimizations to overcome the memory wall and bloat

OdairaRei

NakataniToshio

2012

SIGARCH Comput. Archit. News

View full text Add to dashboard Cite

Future microprocessors will have more serious memory wall problems since they will include more cores and threads in each chip. Similarly, future applications will have more serious memory bloat problems since they are more often written using objectoriented languages and reusable frameworks. To overcome such problems, the language runtime environments must accurately and efficiently profile how programs access objects.We propose Barrier Profiler, a low-overhead object access profiler using a memory-protection-based approach called pointer barrierization and adaptive overhead reduction techniques. Unlike previous memory-protection-based techniques, pointer barrierization offers per-object protection by converting all of the pointers to a given object to corresponding barrier pointers that point to protected pages. Barrier Profiler achieves low overhead by not causing signals at object accesses that are unrelated to the needed profiles, based on profile feedback and a compiler analysis. Our experimental results showed Barrier Profiler provided sufficiently accurate profiles with 1.3% on average and at most 3.4% performance overhead for allocation-intensive benchmarks, while previous code-instrumentation-based techniques suffered from 9.2% on average and at most 12.6% overhead. The low overhead allows Barrier Profiler to be run continuously on production systems. Using Barrier Profiler, we implemented two new online optimizations to compress write-only character arrays and to adjust the initial sizes of mostly non-accessed arrays. They resulted in speed-ups of up to 8.6% and 36%, respectively.

show abstract

Dynamic register promotion of stack variables

Cited by 5 publications

References 20 publications

Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide DSL code

Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide DSL code

A compiler-level intermediate representation based binary analysis and rewriting system

Continuous object access profiling and optimizations to overcome the memory wall and bloat

Contact Info

Product

Resources

About