Efficient intrusion detection and analysis of the security landscape in big data environments present challenge for today's users. Intrusion behavior can be described by provenance graphs that record the dependency relationships between intrusion processes and the infected files. Existing intrusion detection methods typically analyze and identify the anomaly either in a single provenance path or the whole provenance graph, neither of which can achieve the benefit on both detection accuracy and detection time. We propose Pagoda, a hybrid approach that takes into account the anomaly degree of both a single provenance path and the whole provenance graph. It can identify intrusion quickly if a serious compromise has been found on one path, and can further improve the detection rate by considering the behavior representation in the whole provenance graph. Pagoda uses a persistent memory database to store provenance and aggregates multiple similar items into one provenance record to maximumly reduce unnecessary I/O during the detection analysis. In addition, it encodes duplicate items in the rule database and filters noise that does not contain intrusion information. The experimental results on a wide variety of real-world applications demonstrate its performance and efficiency.
Efficient provenance storage is an essential step towards the adoption of provenance. In this paper, we analyze the provenance collected from multiple workloads with a view towards efficient storage. Based on our analysis, we characterize the properties of provenance with respect to long term storage. We then propose a hybrid scheme that takes advantage of the graph structure of provenance data and the inherent duplication in provenance data. Our evaluation indicates that our hybrid scheme, a combination of web graph compression (adapted for provenance) and dictionary encoding, provides the best tradeoff in terms of compression ratio, compression time and query performance when compared to other compression schemes.
Container-based virtualization has gradually become a main solution in today's cloud computing environments. Detecting 6 and analyzing anomaly in containers present a major challenge for cloud vendors and users. This paper proposes an online container 7 anomaly detection system by monitoring and analyzing multidimensional resource metrics of the containers based on the optimized 8 isolation forest algorithm. To improve the detection accuracy, it assigns each resource metric a weight and changes the random feature 9 selection in the isolation forest algorithm to the weighted feature selection according to the resource bias of the container. In addition, 10 it can identify abnormal resource metrics and automatically adjust the monitoring period to reduce the monitoring delay and system 11 overhead. Moreover, it can locate the cause of the anomalies via analyzing and exploring the container log. The experimental results 12 demonstrate the performance and efficiency of the system on detecting the typical anomalies in containers in both simulated and real 13 cloud environments. 14 Index Terms-Docker container, anomaly monitoring, isolation forest, log analysis Ç 15 1 INTRODUCTION 16 W ITH the popularity of cloud computing platforms, 17 more and more enterprises have their own data cen-18 ters, providing services to customers with different needs. 19 One of the key technologies in the data center is virtualiza-20 tion. The docker container [1], as a new virtualization tech-21 nology, has many attractive advantages such as easy to 22 deploy and fast start-up. Thus it has quickly become the 23 darling of major companies (e.g., Amazon [2], IBM [3] and 24 Oracle [4]).25 However, with the increasingly large-scale application of 26 container clusters, the issue of container security and stabil-27 ity has also drawn an increasing attention. For instance, the 28 collapse of Amazon Cloud that builds upon container and 29 virtual machine cluster led to invalidation of thousands of 30 websites and apps [5]. Therefore, it is crucial to detect 31 abnormalities in the container in a timely manner to ensure 32 the service quality of the cloud.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.