In this paper we describe an ontology developed for a cyber security knowledge graph database. This is intended to provide an organized schema that incorporates information from a large variety of structured and unstructured data sources, and includes all relevant concepts within the domain. We compare the resulting ontology with previous efforts, discuss its strengths and limitations, and describe areas for future work.
This survey focuses on intrusion detection systems (IDS) that leverage host-based data sources for detecting attacks on enterprise network. The host-based IDS (HIDS) literature is organized by the input data source, presenting targeted sub-surveys of HIDS research leveraging system logs, audit data, Windows Registry, file systems, and program analysis. While system calls are generally included in audit data, several publicly available system call datasets have spawned a flurry of IDS research on this topic, which merits a separate section. To accommodate current researchers, a section giving descriptions of publicly available datasets is included, outlining their characteristics and shortcomings when used for IDS evaluation. Related surveys are organized and described. All sections are accompanied by tables concisely organizing the literature and datasets discussed. Finally, challenges, trends, and broader observations are throughout the survey and in the conclusion along with future directions of IDS research. Overall, this survey was designed to allow easy access to the diverse types of data available on a host for sensing intrusion, the progressions of research using each, and the accessible datasets for prototyping in the area.
Public disclosure of important security information, such as knowledge of vulnerabilities or exploits, often occurs in blogs, tweets, mailing lists, and other online sources significantly before proper classification into structured databases. In order to facilitate timely discovery of such knowledge, we propose a novel semi-supervised learning algorithm, PACE, for identifying and classifying relevant entities in text sources. The main contribution of this paper is an enhancement of the traditional bootstrapping method for entity extraction by employing a time-memory tradeoff that simultaneously circumvents a costly corpus search while strengthening pattern nomination, which should increase accuracy. An implementation in the cyber-security domain is discussed as well as challenges to Natural Language Processing imposed by the security domain.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.