Abstract:Ensuring data confidentiality and integrity are key concerns for information security professionals, who typically have to obtain and integrate information from multiple sources to detect unauthorized data modifications and transmissions. The instrumentation that operating systems provide for the monitoring of file system level activity can yield important clues on possible data tampering and exfiltration activity but the raw data that these tools provide is difficult to interpret, contextualize and query. In … Show more
The integration of heterogeneous and weakly linked log data poses a major challenge in many log-analytic applications. Knowledge graphs (KGs) can facilitate such integration by providing a versatile representation that can interlink objects of interest and enrich log events with background knowledge. Furthermore, graph-pattern based query languages, such as SPARQL, can support rich log analyses by leveraging semantic relationships between objects in heterogeneous log streams. Constructing, materializing, and maintaining centralized log knowledge graphs, however, poses significant challenges. To tackle this issue, we propose VloGraph—a distributed and virtualized alternative to centralized log knowledge graph construction. The proposed approach does not involve any a priori parsing, aggregation, and processing of log data, but dynamically constructs a virtual log KG from heterogeneous raw log sources across multiple hosts. To explore the feasibility of this approach, we developed a prototype and demonstrate its applicability to three scenarios. Furthermore, we evaluate the approach in various experimental settings with multiple heterogeneous log sources and machines; the encouraging results from this evaluation suggest that the approach can enable efficient graph-based ad-hoc log analyses in federated settings.
The integration of heterogeneous and weakly linked log data poses a major challenge in many log-analytic applications. Knowledge graphs (KGs) can facilitate such integration by providing a versatile representation that can interlink objects of interest and enrich log events with background knowledge. Furthermore, graph-pattern based query languages, such as SPARQL, can support rich log analyses by leveraging semantic relationships between objects in heterogeneous log streams. Constructing, materializing, and maintaining centralized log knowledge graphs, however, poses significant challenges. To tackle this issue, we propose VloGraph—a distributed and virtualized alternative to centralized log knowledge graph construction. The proposed approach does not involve any a priori parsing, aggregation, and processing of log data, but dynamically constructs a virtual log KG from heterogeneous raw log sources across multiple hosts. To explore the feasibility of this approach, we developed a prototype and demonstrate its applicability to three scenarios. Furthermore, we evaluate the approach in various experimental settings with multiple heterogeneous log sources and machines; the encouraging results from this evaluation suggest that the approach can enable efficient graph-based ad-hoc log analyses in federated settings.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.