We introduce a computationally scalable method for detecting small anomalous subgraphs in large, time-dependent graphs. This work is motivated by, and validated against, the challenge of identifying intruders operating inside enterprise-sized computer networks with 500 million communication events per day. Every observed edge (time series of communications between each pair of computers on the network) is modeled using observed and hidden Markov models to establish baselines of behavior for purposes of anomaly detection. These models capture the bursty, often human-caused, behavior that dominates a large subset of the edges. Individual edge anomalies are common, but the network intrusions we seek to identify always involve coincident anomalies on multiple adjacent edges. We show empirically that adjacent edges are primarily independent and that the likelihood of a subgraph of multiple coincident edges can be evaluated using only models of individual edges. We define a new scan statistic in which subgraphs of specific sizes and shapes (out-stars and 3-paths) are tested. We show that identifying these building-block shapes is sufficient to correctly identify anomalies of various shapes with acceptable false discovery rates in both simulated and real-world examples.
IntroductionIn this chapter, we consider the problem of detecting locally anomalous activity in a set of time-dependent data having an underlying graph structure. While the method proposed can be applied to a general setting in which data is extracted from a graph over time, and in which anomalies occur in connected subgraphs, we will focus exclusively on the detection of attacks within a large computer network. Specifically, we are interested in detecting those attacks that create connected subgraphs within which the communications have deviated from historic behavior in some window of time.
71Data Analysis for Network Cyber-Security Downloaded from www.worldscientific.com by NANYANG TECHNOLOGICAL UNIVERSITY on 10/01/15. For personal use only.