The lack of data sets derived from operational enterprise networks continues to be a critical deficiency in the cyber security research community. Unfortunately, releasing viable data sets to the larger community is challenging for a number of reasons, primarily the difficulty of balancing security and privacy concerns against the fidelity and utility of the data. This chapter discusses the importance of cyber security research data sets and introduces a large data set derived from the operational network environment at Los Alamos National Laboratory. The hope is that this data set and associated discussion will act as a catalyst for both new research in cyber security as well as motivation for other organizations to release similar data sets to the community.
Abstract-Anomaly detection techniques for identifying compromised user credentials in an enterprise network are an important research problem, garnering much attention within industry over recent years. One important aspect of the research problem is peer-based user analysis. A method based on recommender system algorithms is proposed here, quantifying when a user activity is unlikely based on the behavior of similar users. Building several recommender system algorithms for separate user activities provides an additional advantage of allowing for different peer group structures depending on the user activity being considered.
Statistical approaches to cyber-security involve building realistic probability models of computer network data. In a data pre-processing phase, separating automated events from those caused by human activity should improve statistical model building and enhance anomaly detection capabilities. This article presents a changepoint detection framework for identifying periodic subsequences of event times. The opening event of each subsequence can be interpreted as a human action which then generates an automated, periodic process. Difficulties arising from the presence of duplicate and missing data are addressed. The methodology is demonstrated using authentication data from the computer network of Los Alamos National Laboratory.
Cybersecurity is an ever-important aspect of our interconnected world, but security defenses lag behind the adversaries who with increasing sophistication seek to disrupt cybersystems. The emergence of massively distributed systems such as the Internet of Things (IoT) has opened up new vulnerabilities that go beyond traditional protective measures such as firewalls, password protection, and single point-of-attack responses. To address these emerging vulnerabilities, data science has much to contribute, including methods of distributed statistical inference, data fusion, anomaly detection, and adversarial machine learning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.