The efficacy of Anomaly Detection (AD) sensors depends heavily on the quality of the data used to train them. Artificial or contrived training data may not provide a realistic view of the deployment environment. Most realistic data sets are dirty; that is, they contain a number of attacks or anomalous events. The size of these high-quality training data sets makes manual removal or labeling of attack data infeasible. As a result, sensors trained on this data can miss attacks and their variations. We propose extending the training phase of AD sensors (in a manner agnostic to the underlying AD algorithm) to include a sanitization phase. This phase generates multiple models conditioned on small slices of the training data. We use these "micromodels" to produce provisional labels for each training input, and we combine the micro-models in a voting scheme to determine which parts of the training data may represent attacks. Our results suggest that this phase automatically and significantly improves the quality of unlabeled training data by making it as "attack-free" and "regular" as possible in the absence of absolute ground truth. We also show how a collaborative approach that combines models from different networks or domains can further refine the sanitization process to thwart targeted training or mimicry attacks against a single site.
The increasing array of Internet-scale threats is a pressing problem for every organization that utilizes the network. Organizations have limited resources to detect and respond to these threats. The end-to-end (E2E) sharing of information related to probes and attacks is a facet of an emerging trend toward "collaborative security."The key benefit of a collaborative approach to intrusion detection is a better view of global network attack activity. Augmenting the information obtained at a single site with information gathered from across the network can provide a more precise model of an attacker's behavior and intent. While many organizations see value in adopting such a collaborative approach, some challenges must be addressed before intrusion detection can be performed on an interorganizational scale.We report on our experience developing and deploying a decentralized system for efficiently distributing alerts to collaborating peers. Our system, Worminator, extracts relevant information from alert streams and encodes it in Bloom Filters. This information forms the basis of a distributed watchlist. The watchlist can be distributed via a choice of mechanisms ranging from a centralized trusted third party to a decentralized P2P-style overlay network.M. E.
In this paper, we present ShieldGen, a system for automatically generating a data patch or a vulnerability signature for an unknown vulnerability, given a zero-day attack instance. The key novelty in our work is that we leverage knowledge of the data format to generate new potential attack instances, which we call probes, and use a zero-day detector as an oracle to determine if an instance can still exploit the vulnerability; the feedback of the oracle guides our search for the vulnerability signature. We have implemented a ShieldGen prototype and experimented with three known vulnerabilities. The generated signatures have no false positives and a low rate of false negatives due to imperfect data format specifications and the sampling technique used in our probe generation. Overall, they are significantly more precise than the signatures generated by existing schemes. We have also conducted a detailed study of 25 vulnerabilities for which Microsoft has issued security bulletins between 2003 and 2006. We estimate that ShieldGen can produce high quality signatures for a large portion of those vulnerabilities and that the signatures are superior to the signatures generated by existing schemes.
Intrusion detection systems are fundamentally passive and fail-open. Because their primary task is classification, they do nothing to prevent an attack from succeeding. An intrusion prevention system (IPS) adds protection mechanisms that provide fail-safe semantics, automatic response capabilities, and adaptive enforcement. We present FLIPS (Feedback Learning IPS), a hybrid approach to host security that prevents binary code injection attacks. It incorporates three major components: an anomaly-based classifier, a signature-based filtering scheme, and a supervision framework that employs Instruction Set Randomization (ISR). Since ISR prevents code injection attacks and can also precisely identify the injected code, we can tune the classifier and the filter via a learning mechanism based on this feedback. Capturing the injected code allows FLIPS to construct signatures for zero-day exploits. The filter can discard input that is anomalous or matches known malicious input, effectively protecting the application from additional instances of an attack-even zero-day attacks or attacks that are metamorphic in nature. FLIPS does not require a known user base and can be deployed transparently to clients and with minimal impact on servers. We describe a prototype that protects HTTP servers, but FLIPS can be applied to a variety of server and client applications.
Current trends demonstrate an increasing use of polymorphism by attackers to disguise their exploits. The ability for malicious code to be easily, and automatically, transformed into semantically equivalent variants frustrates attempts to construct simple, easily verifiable representations for use in security sensors. In this paper, we present a quantitative analysis of the strengths and limitations of shellcode polymorphism, and describe the impact that these techniques have in the context of learning-based IDS systems. Our examination focuses on dual problems: shellcode encryption-based evasion methods and targeted "blending" attacks. Both techniques are currently being used in the wild, allowing real exploits to evade IDS sensors. This paper provides metrics to measure the effectiveness of modern polymorphic engines and provide insights into their designs. We describe methods to evade statistics-based IDS sensors and present suggestions on how to defend against them. Our experimental results illustrate that the challenge of modeling self-modifying shellcode by signature-based methods, and certain classes of statistical models, is likely an intractable problem.
Abstract-We describe Instruction-Set Randomization (ISR), a general approach for safeguarding systems against any type of code-injection attack. We apply Kerckhoffs' principle to create OS process-specific randomized instruction sets (e.g., machine instructions) of the system executing potentially vulnerable software. An attacker who does not know the key to the randomization algorithm will inject code that is invalid for that (randomized) environment, causing a runtime exception. Our approach is applicable to machine-language programs, scripting and interpreted languages.We discuss three approaches (protection for Intel x86 executables, Perl scripts, and SQL queries), one from each of the above categories. Our goal is to demonstrate the generality and applicability of ISR as a protection mechanism. Our emulator-based prototype demonstrates the feasibility ISR for x86 executables, and should be directly usable on a suitably modified processor. We demonstrate how to mitigate the significant performance impact of emulation-based ISR by using several heuristics to limit the scope of randomized (and interpreted) execution to sections of code that may be more susceptible to exploitation. The SQL prototype consists of an SQL query-randomizing proxy that protects against SQL-injection attacks with no changes to database servers, minor changes to CGI scripts, and with negligible performance overhead. Similarly, the performance penalty of a randomized Perl interpreter is minimal. Where the performance impact of our proposed approach is acceptable (i.e., in an already-emulated environment, in the presence of programmable or specialized hardware, or in interpreted languages), it can serve as a broad protection mechanism and complement other security mechanisms.
Polymorphic malcode remains a troubling threat. The ability for malcode to automatically transform into semantically equivalent variants frustrates attempts to rapidly construct a single, simple, easily verifiable representation. We present a quantitative analysis of the strengths and limitations of shellcode polymorphism and consider its impact on current intrusion detection practice. We focus on the nature of shellcode decoding routines. The empirical evidence we gather helps show that modeling the class of self-modifying code is likely intractable by known methods, including both statistical constructs and string signatures. In addition, we develop and present measures that provide insight into the capabilities, strengths, and weaknesses of polymorphic engines. In order to explore countermeasures to future polymorphic threats, we show how to improve polymorphic techniques and create a proofof-concept engine expressing these improvements. Our results indicate that the class of polymorphic behavior is too greatly spread and varied to model effectively. Our analysis also supplies a novel way to understand the limitations of current signature-based techniques. We conclude that modeling normal content is ultimately a more promising defense mechanism than modeling malicious or abnormal content.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.