Automated software testing based on fuzzing has experienced a revival in recent years. Especially feedback-driven fuzzing has become well-known for its ability to efficiently perform randomized testing with limited input corpora. Despite a lot of progress, two common problems are magic numbers and (nested) checksums. Computationally expensive methods such as taint tracking and symbolic execution are typically used to overcome such roadblocks. Unfortunately, such methods often require access to source code, a rather precise description of the environment (e.g., behavior of library calls or the underlying OS), or the exact semantics of the platform's instruction set. In this paper, we introduce a lightweight, yet very effective alternative to taint tracking and symbolic execution to facilitate and optimize state-of-the-art feedback fuzzing that easily scales to large binary applications and unknown environments. We observe that during the execution of a given program, parts of the input often end up directly (i.e., nearly unmodified) in the program state. This input-to-state correspondence can be exploited to create a robust method to overcome common fuzzing roadblocks in a highly effective and efficient manner. Our prototype implementation, called REDQUEEN, is able to solve magic bytes and (nested) checksum tests automatically for a given binary executable. Additionally, we show that our techniques outperform various state-of-the-art tools on a wide variety of targets across different privilege levels (kernel-space and userland) with no platform-specific code. REDQUEEN is the first method to find more than 100% of the bugs planted in LAVA-M across all targets. Furthermore, we were able to discover 65 new bugs and obtained 16 CVEs in multiple programs and OS kernel drivers. Finally, our evaluation demonstrates that REDQUEEN is fast, widely applicable and outperforms concurrent approaches by up to three orders of magnitude.
Fuzz testing is a well-known method for efficiently identifying bugs in programs. Unfortunately, when programs that require highly-structured inputs such as interpreters are fuzzed, many fuzzing methods struggle to pass the syntax checks: interpreters often process inputs in multiple stages, first syntactic and then semantic correctness is checked. Only if both checks are passed, the interpreted code gets executed. This prevents fuzzers from executing "deeper"-and hence potentially more interesting-code. Typically, two valid inputs that lead to the execution of different features in the target program require too many mutations for simple mutation-based fuzzers to discover: making small changes like bit flips usually only leads to the execution of error paths in the parsing engine. So-called grammar fuzzers are able to pass the syntax checks by using Context-Free Grammars. Feedback can significantly increase the efficiency of fuzzing engines and is commonly used in state-of-the-art mutational fuzzers which do not use grammars. Yet, current grammar fuzzers do not make use of code coverage, i.e., they do not know whether any input triggers new functionality. In this paper, we propose NAUTILUS, a method to efficiently fuzz programs that require highly-structured inputs by combining the use of grammars with the use of code coverage feedback. This allows us to recombine aspects of interesting inputs, and to increase the probability that any generated input will be syntactically and semantically correct. We implemented a proofof-concept fuzzer that we tested on multiple targets, including ChakraCore (the JavaScript engine of Microsoft Edge), PHP, mruby, and Lua. NAUTILUS identified multiple bugs in all of the targets: Seven in mruby, three in PHP, two in ChakraCore, and one in Lua. Reporting these bugs was awarded with a sum of 2600 USD and 6 CVEs were assigned. Our experiments show that combining context-free grammars and feedback-driven fuzzing significantly outperforms state-of-the-art approaches like AFL by an order of magnitude and grammar fuzzers by more than a factor of two when measuring code coverage. An intuitive solution to this problem is to use (context-free) grammars to generate syntactically-correct inputs. Previous works [5], [17], [36], [47] use this approach, but they do not leverage instrumentation feedback, which allows the fuzzer to distinguish inputs that reach a new part of the code base from inputs that reach no new code. Leveraging feedback led to a great improvement in the performance of general-purpose fuzzers. One of the most popular feedback-oriented fuzzers is AFL [19], which was used to identify bugs in hundreds of applications and tools. Using code coverage feedback, AFL is able to intelligently combine interesting inputs to explore deeper code, which would take an unreasonable amount of time without feedback. In contrast, AFL struggles with heavily-structured file formats since it is optimized for binary formats and does not support grammars. Note that AFL can be provided with a list of...
In this system description, we present the tool AProVE for automatic termination and complexity proofs of Java, C, Haskell, Prolog, and rewrite systems. In addition to classical term rewrite systems (TRSs), AProVE also supports rewrite systems containing built-in integers (int-TRSs). To analyze programs in high-level languages, AProVE automatically converts them to (int-)TRSs. Then, a wide range of techniques is employed to prove termination and to infer complexity bounds for the resulting rewrite systems. The generated proofs can be exported to check their correctness using automatic certifiers. To use AProVE in software construction, we present a corresponding plug-in for the popular Eclipse software development environment.
Fig. 1: AFL and AFL + IJON trying to defeat Bowser in Super Mario Bros. (Level 3-4). The lines are the traces of all runs found by the fuzzer.
Virtual machine monitors (VMMs, also called hypervisors) represent a very critical part of a modern software stack: compromising them could allow an attacker to take full control of the whole cloud infrastructure of any cloud provider. Hence their security is critical for many applications, especially in the context of Infrastructure-as-a-Service. In this paper, we present the design and implementation of HYPER-CUBE, a novel fuzzer that aims explicitly at testing hypervisors in an efficient, effective, and precise way. Our approach is based on a custom operating system that implements a custom bytecode interpreter. This high-throughput design for long-running, interactive targets allows us to fuzz a large number of both open source and proprietary hypervisors. In contrast to one-dimensional fuzzers such as AFL, HYPER-CUBE can interact with any number of interfaces in any order. Our evaluation results show that we can find more bugs (over 2×) and coverage (as much as 2×) than state-of-the-art hypervisor fuzzers. In most cases, we were even able to do so using multiple orders of magnitude less time than comparable fuzzers. HYPER-CUBE was also able to rediscover a set of well-known hypervisor vulnerabilities, such as VENOM, in less than five minutes. In total, we found 54 novel bugs, and so far obtained 43 CVEs. Our evaluation results demonstrate that nextgeneration coverage-guided fuzzers should incorporate a higherthroughput design for long-running targets such as hypervisors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.