In the past few years, data lakes emerged as a trending topic in big data technologies. Although literature presents different points of view related to its functionalities, it serves mainly to store a variety of data in a big data context. In this paper, we aim to identify and analyze data lake definitions and possible architectures. Our methodology was composed of a systematic literature mapping based on PRISMA, software engineering best practices to perform reviews, and Kappa method to assess results' quality. We performed the search in eight different electronic databases to achieve a wide variety of publishers in Computer Science. We first identified 662 papers matching our search criteria; after filtering, we selected 87 papers for review. We found that the term data lakes was first defined by James Dixon in 2010. We also found that the term is often related to raw data repositories. From the identified definitions, we propose a new one as a means to better state what data lakes refer to and improve how the community use them. Moreover, we foind that Hadoop and its ecosystem compose the most used toolset to create data lakes, revealing that this is the mainstream in architectures for data lakes as of today's available technologies.
Intrusion detection systems have traditionally been based on the characterization of an attack and the tracking of the activity on the system to see if it matches that characterization. Recently, new intrusion detection systems based on data mining are making their appearance in the field. This paper describes the design and experiences with the ADAM (Audit Data Analysis and Mining) system, which we use as a testbed to study how useful data mining techniques can be in intrusion detection.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.