Data Cleansing or (data scrubbing) is an activity involving a process of detecting and correcting the errors and inconsistencies in data warehouse. Thus poor quality data i.e.; dirty data present in a data mart can be avoided using various data cleaning strategies, and thus leading to more accurate and hence reliable decision making. The quality data can only be produced by cleaning the data and pre-processing it prior to loading it in the data warehouse.As not all the algorithms address the problems related to every type of dirty data, one has to prioritize the need of its organization and use the algorithm according to their requirements and occurrence of dirty data. This paper focuses on the two data cleaning algorithms: Alliance Rules and HADCLEAN and their approaches towards the data quality. It also includes a comparison of the various factors and aspects common to both. General TermsData Mart, Data Warehouse, Dirty data.
Today the world is dependent upon so many advanced technologies and network systems, that their protection from those which intent to break the system with malicious attacks, or trying some unauthorized access with an intention of financial gain or simply trying to intrude the system has become essential. This leads to the need of Intrusion Detection Systems.Many algorithms have been suggested to implement this system, which requires building of a training model by using a training data set. In this paper,NSL KDD data set will be used to train the system using Naïve Bayes approach and then there is an attempt to improve its accuracy by proposing an algorithm based on feature selection. A concept of threshold is also introduced which works on the principle of C4.5 algorithm.The proposed algorithm is applied on another data set that is supplied by the user which is also a part of NSL KDD. This paper discusses the proposed algorithm which is used to improve the performance of the classification system of the Naïve Bayes Classifier and reduce the number of false alarm rate to some extent.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.