Abstract. Much data access occurs via HTTP, which is becoming a universal transport protocol. Because of this, it has become a common exploit target and several HTTP specific IDSs have been proposed as a response. However, each IDS is developed and tested independently, and direct comparisons are difficult. We describe a framework for testing IDS algorithms, and apply it to several proposed anomaly detection algorithms, testing using identical data and test environment. The results show serious limitations in all approaches, and we make predictions about requirements for successful anomaly detection approaches used to protect web servers.
Intrusion detection is a key technology for self-healing systems designed to prevent or manage damage caused by security threats. Protecting web server-based applications using intrusion detection is challenging, especially when autonomy is required (i.e., without signature updates or extensive administrative overhead). Web applications are difficult to protect because they are large, complex, highly customized, and often created by programmers with little security background. Anomaly-based intrusion detection has been proposed as a strategy to meet these requirements. This paper describes how DFA (Deterministic Finite Automata) induction can be used to detect malicious web requests. The method is used in combination with rules for reducing variability among requests and heuristics for filtering and grouping anomalies. With this setup a wide variety of attacks is detectable with few false-positives, even when the system is trained on data containing benign attacks (e.g., attacks that fail against properly patched servers).
Anomaly detection systems have the potential to detect zero-day attacks. However, these systems can suffer from high rates of false positives and can be evaded through through mimicry attacks. The key to addressing both problems is careful control of model generalization. An anomaly detection system that undergeneralizes generates too many false positives, while one that overgeneralizes misses attacks. In this paper, we present a methodology for creating anomaly detection systems that make appropriate trade-offs regarding model precision and generalization. Specifically, we propose that systems be created by taking an appropriate, undergeneralizing data modeling method and extending it using data pre-processing generalization heuristics. To show the utility of our methodology, we show how it has been applied to the problem of detecting malicious web requests.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.