Detecting massive network events like worm outbreaks in fast IP networks, such as Internet backbones, is hard. One problem is that the amount of traffic data does not allow real-time analysis of details. Another problem is that the specific characteristics of these events are not known in advance. There is a need for analysis methods that are real-time capable and can handle large amounts of traffic data. We have developed an entropy-based approach, that determines and reports entropy contents of traffic parameters such as IP addresses. Changes in the entropy content indicate a massive network event. We give analyses on two Internet worms as proof-of-concept. While our primary focus is detection of fast worms, our approach should also be able to detect other network events. We discuss implementation alternatives and give benchmark results. We also show that our approach scales very well.
Packet sampling methods such as Cisco's NetFlow are widely employed by large networks to reduce the amount of traffic data measured. A key problem with packet sampling is that it is inherently a lossy process, discarding (potentially useful) information. In this paper, we empirically evaluate the impact of sampling on anomaly detection metrics. Starting with unsampled flow records collected during the Blaster worm outbreak, we reconstruct the underlying packet trace and simulate packet sampling at increasing rates. We then use our knowledge of the Blaster anomaly to build a baseline of normal traffic (without Blaster), against which we can measure the anomaly size at various sampling rates. This approach allows us to evaluate the impact of packet sampling on anomaly detection without being restricted to (or biased by) a particular anomaly detection method.We find that packet sampling does not disturb the anomaly size when measured in volume metrics such as the number of bytes and number of packets, but grossly biases the number of flows. However, we find that recently proposed entropy-based summarizations of packet and flow counts are affected less by sampling, and expose the Blaster worm outbreak even at higher sampling rates. Our findings suggest that entropy summarizations are more resilient to sampling than volume metrics. Thus, while not perfect, sampling still preserves sufficient distributional structure, which when harnessed by tools like entropy, can expose hard-to-detect scanning anomalies.
Anomaly extraction is an important problem essential to several applications ranging from root cause analysis, to attack mitigation, and testing anomaly detectors. Anomaly extraction is preceded by an anomaly detection step, which detects anomalous events and may identify a large set of possible associated event flows. The goal of anomaly extraction is to find and summarize the set of flows that are effectively caused by the anomalous event.In this work, we use meta-data provided by several histogram-based detectors to identify suspicious flows and then apply association rule mining to find and summarize the event flows. Using rich traffic data from a backbone network (SWITCH/AS559), we show that we can reduce the classification cost, in terms of items (flows or rules) that need to be classified, by several orders of magnitude. Further, we show that our techniques effectively isolate event flows in all analyzed cases and that on average trigger between 2 and 8.5 false positives, which can be trivially sorted out by an administrator.
Anomaly extraction is an important problem essential to several applications ranging from root cause analysis, to attack mitigation, and testing anomaly detectors. Anomaly extraction is preceded by an anomaly detection step, which detects anomalous events and may identify a large set of possible associated event flows. The goal of anomaly extraction is to find and summarize the set of flows that are effectively caused by the anomalous event.In this work, we use meta-data provided by several histogram-based detectors to identify suspicious flows and then apply association rule mining to find and summarize the event flows. Using rich traffic data from a backbone network (SWITCH/AS559), we show that we can reduce the classification cost, in terms of items (flows or rules) that need to be classified, by several orders of magnitude. Further, we show that our techniques effectively isolate event flows in all analyzed cases and that on average trigger between 2 and 8.5 false positives, which can be trivially sorted out by an administrator.
Fast Internet worms are a relatively new threat to Internet infrastructure and hosts. We discuss motivation and possibilities to study the behaviour of such worms and degrees of freedom that worm writers have. To facilitate the study of fast worms we have designed a simulator. We describe the design of this simulator and discuss practical experiences we have made with it and compare observation of past worms with simulated behaviour. One specific feature of the simulator is that the Internet model used can represent network bandwidth and latency constraints.
This document describes a file format for the storage of flow data based upon the IP Flow Information Export (IPFIX) protocol. It proposes a set of requirements for flat-file, binary flow data file formats, then specifies the IPFIX File format to meet these requirements based upon IPFIX Messages. This IPFIX File format is designed to facilitate interoperability and reusability among a wide variety of flow storage, processing, and analysis tools.
Companies that rely on the Internet for their daily business are challenged by uncontrolled massive worm spreading and the lurking threat of large-scale distributed denial of service attacks. We present a new model and methodology, which allows a company to qualitatively and quantitatively estimate possible financial losses due to partial or complete interruption of Internet connectivity. Our systems engineering approach is based on an in-depth analysis of the Internet dependence of different types of enterprises and on interviews with Swiss telcos, backbone and Internet service providers. A discussion of sample scenarios illustrates the flexibility and applicability of our model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.