Debug information, usually encoded in the DWARF format, is a hidden and obscure component of our computing infrastructure. Debug information is obviously used by debuggers, but it also plays a key role in program analysis tools, and, most surprisingly, it can be relied upon by the runtime of high-level programming languages. For instance the C++ runtime leverages DWARF stack unwind tables to implement exceptions! Alas, generating debug information adds significant burden to compiler implementations, and the debug information itself can be pervaded by subtle bugs, making the whole infrastructure unreliable. Additionally, interpreting the debug tables is a time-consuming task and, for some applications as sampling profilers, it turns out to be a performance bottleneck. In this paper we focus on the DWARF .eh_frame table, that enables stack unwinding in absence of framepointers. We will describe two techniques to perform validation and synthesis of the DWARF stack unwinding tables, and their implementation for the x86_64 architecture. The validation tool has proven effective for compiler and inline assembly testing, while the synthesis tool can generate DWARF unwind tables for arbitrary binaries lacking debug information. Additionally, we report on a technique to precompile unwind tables into native x86_64 code, which we have implemented and integrated into libunwind, resulting in 11x-25x DWARF-based unwind speedups. CCS Concepts: • Software and its engineering → Compilers; Software testing and debugging.
In a super-scalar architecture, the scheduler dynamically assigns micro-operations (µOPs) to execution ports. The port mapping of an architecture describes how an instruction decomposes into µOPs and lists for each µOP the set of ports it can be mapped to. It is used by compilers and performance debugging tools to characterize the performance throughput of a sequence of instructions repeatedly executed as the core component of a loop.This paper introduces a dual equivalent representation: The resource mapping of an architecture is an abstract model where, to be executed, an instruction must use a set of abstract resources, themselves representing combinations of execution ports. For a given architecture, finding a port mapping is an important but difficult problem. Building a resource mapping is a more tractable problem and provides a simpler and equivalent model. This paper describes Palmed, a tool that automatically builds a resource mapping for pipelined, super-scalar, out-of-order CPU architectures. Palmed does not require hardware performance counters, and relies solely on runtime measurements.We evaluate the pertinence of our dual representation for throughput modeling by extracting a representative set of basic-blocks from the compiled binaries of the SPEC CPU 2017 benchmarks. We compared the throughput predicted by existing machine models to that produced by Palmed, and found comparable accuracy to state-of-the art tools, achieving sub-10 % mean square error rate on this workload on Intel's Skylake microarchitecture.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.