Mining frequent and infrequent itemsets from a given dataset is the most important field of data mining. When we mine frequent and infrequent itemsets simultaneously, infrequent itemsets become very important because there are many valued negative association rules in them. Mining frequent Itemset is highly expensive, if the minimum threshold is low, whereas mining infrequent itemsets is highly expensive, if the minimum threshold is high. When the dataset size is very large, both memory usage and computational cost of mining infrequent items is very expensive. In addition, single processor's memory and CPU resources are not enough to handle very large datasets. Parallel and distributed computing are effective approaches to handle large datasets. In this paper we proposed a method based on Hadoop-MapReduce model, which can handle massive datasets in mining infrequent itemsets. Experiments are performed on 8 node cluster with a synthetic dataset. The performance study shows that the proposed method is efficient in handling very large datasets.
Abstract-Positive and negative association rules are important to find useful information hidden in large datasets, especially negative association rules can reflect mutually exclusive correlation among items. Association rule mining among frequent items has been extensively studied in data mining research. However, in recent years, there has been an increasing demand for mining the infrequent items. In this paper, we propose a tree based approach to store both frequent and infrequent itemsets to mine both the positive and negative association rules from frequent and infrequent itemsets. It minimizes I/O overhead by scanning the database only once. The performance study shows that the proposed method is an efficient than the previously proposed method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.