Good tests are important in software development, but it can be hard to tell whether tests will reveal future faults that are themselves unknown. Mutation analysis, which checks whether tests reveal inserted changes in a program, is a strong measure of test suite adequacy, but common source-or compilerlevel approaches to mutation testing are not applicable to software available only in binary form. We explore mutation analysis as an application of the reassembleable disassembly approach to binary rewriting, building a tool for x86 binaries on top of the previously-developed Uroboros system, and apply it to the C benchmarks from SPEC CPU 2006 and to five examples of embedded control software. The results demonstrate that our approach works effectively across these software domains: as expected, tests designed for performance benchmarking reveal fewer mutants than tests generated to achieve high code coverage, but mutation scores indicate differences in test origins and features such as code size and fault-tolerance. Our binary-level tool also achieves comparable results to source-level mutation analysis despite supporting a more limited set of mutation operators. More generally we also argue that our experience shows how reassembleable disassembly is a valuable approach for constructing novel binary rewriting tools.
Mutation analysis is an effective technique to evaluate a test suite adequacy in terms of revealing unforeseen bugs in software. Traditional source-or IR-level mutation analysis is not applicable to the software only available in binary format. This paper proposes a practical binary mutation analysis via binary rewriting, along with a rich set of mutation operators to represent more realistic bugs. We implemented our approach using two state-of-the-art binary rewriting tools and evaluated its effectiveness and scalability by applying them to SPEC CPU benchmarks. Our analysis revealed that the richer mutation operators contribute to generating more diverse mutants, which, compared to previous works leads to a higher mutation score for the test harness. We also conclude that the reassembleable disassembly rewriting yields better scalability in comparison to lifting to an intermediate representation and performing a full translation.
Software-based fault isolation (SFI) is a technique to isolate a potentially faulty or malicious software module from the rest of a system using instruction-level rewriting. SFI implementations on CISC architectures, including Google Native Client, use instruction padding to enforce an address layout invariant and restrict control flow. However this padding decreases code density and imposes runtime overhead. We analyze this overhead, and show that it can be reduced by allowing some execution of overlapping instructions, as long as those overlapping instructions are still safe according to the original per-instruction policy. We implemented this change for both 32-bit and 64-bit x86 versions of Native Client, and analyzed why the performance benefit is higher on 32-bit. The optimization leads to a consistent decrease in the number of instructions executed and savings averaging 8.6% in execution time (over compatible benchmarks from SPECint2006) for x86-32. We describe how to modify the validation algorithm to check the more permissive policy, and extend a machine-checked Coq proof to confirm that the system's security is preserved.
The kernel space is shared by hardware and all processes, so its memory usage is more limited, and memory is harder to reclaim, compared to user-space memory; as a result, memory leaks in the kernel can easily lead to high-impact denial of service. The problem is particularly critical in longrunning servers. Kernel code makes heavy use of dynamic (heap) allocation, and many code modules within the kernel provide their own abstractions for customized memory management. On the other hand, the kernel code involves highly complicated data flow, so it is hard to determine where an object is supposed to be released. Given the complex and critical nature of OS kernels, as well as the heavy specialization, existing methods largely fail at effectively and thoroughly detecting kernel memory leaks. In this paper, we present K-MELD, a static detection system for kernel memory leaks. K-MELD features multiple new techniques that can automatically identify specialized allocation/deallocation functions and determine the expected memoryrelease locations. Specifically, we first develop a usage-and structure-aware approach to effectively identify specialized allocation functions, and employ a new rule-mining approach to identify the corresponding deallocation functions. We then develop a new ownership reasoning mechanism that employs enhanced escape analysis and consumer-function analysis to infer expected release locations. By applying K-MELD to the Linux kernel, we confirm its effectiveness: it finds 218 new bugs, with 41 CVEs assigned. Out of those 218 bugs, 115 are in specialized modules.
Cyclic redundancy check functions (CRCs) are commonly used as checksums in hardware and software systems. Though some previous work has treated CRCs as if they would be difficult for automated reasoning or proposed using them for obfuscation, we argue that CRCs need not be difficult for symbolic execution. CRCs have a strong algebraic structure, and the key to efficient reasoning about them is exposing this structure. Common CRC implementation techniques use perbit branching or lookup tables, but these code structures can be compactly summarized by symbolic execution engines that perform path merging or create formulas reflecting the contents of a table, respectively. To evaluate the effectiveness of these techniques, we test the binary symbolic execution tools angr and FuzzBALL, and the Java bytecode tool Java Ranger, on symbolically finding CRC preimages for output sizes of 32 and 64 bits and input sizes up to 8192 bytes. The best configurations can find short CRC preimages in just a fraction of a second and long ones in under 10 seconds, demonstrating that constraint solving with CRCs is not difficult.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.