FiDoop-DP: Data Partitioning in Frequent Itemset Mining on Hadoop Clusters

Xun, Yaling; Zhang, Jifu; Qin, Xiao; Zhao, Xujun

doi:10.1109/tpds.2016.2560176

Cited by 63 publications

(30 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…MapReduce is a software framework that is designed for cluster computing . The architecture of cluster systems is a multiprocessing system that connects multiple hosts and has high computing power and reliability to meet the needs of various types of applications . MapReduce and the Hadoop distributed file system (HDFS) are two cores of this framework to achieve decentralized computing .…”

Section: Related Workmentioning

confidence: 99%

“…Figure illustrates the HDFS architecture, in which the coordinator NameNode schedules the processes with the metadata and the processes are assigned to a cluster of DataNodes . All the data are split into several blocks and stored in different DataNodes, and each block in other nodes has several replications . When a program requires access to a file, NameNode coordinates the relevant DataNode to respond and NameNode moves the files stored in the HDFS and simultaneously copies them to the other DataNodes.…”

Section: Related Workmentioning

confidence: 99%

“…4,20,23 All the data are split into several blocks and stored in different DataNodes, and each block in other nodes has several replications. 8,22 When a program requires access to a file, NameNode coordinates the relevant DataNode to respond and NameNode moves the files stored in the HDFS and simultaneously copies them to the other DataNodes.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Performance enhancement for iterative data computing with in‐memory concurrent processing

Wen

Chen

Chiu

et al. 2019

Concurrency and Computation

View full text Add to dashboard Cite

Summary The big data era has resulted in the development of several data analysis tools. Spark is a type of in‐memory processing fitted iteration and interactive data mining tool. This tool possesses higher data‐processing performance than MapReduce, which is an offline storage mechanism. However, some disadvantages of in‐memory processing, such as massive in‐memory data requirements, cause cross‐node data transfer that result in a long computation time. The performance of the process can be improved if the in‐memory process is executed with fewer shuffle instructions. Therefore, this study aims to enhance the performance of iterative application through instruction replacement. Three empirical research cases with diverse datasets and iterations are used to modify the program. We adopt a strategy of downloading a small resilient distributed dataset and replacing the shuffle‐included instructions to shorten the processing time with an automated code replacement by using exhaustively code matching. The experimental results reveal an improvement of up to 39% in the execution time compared with the existing in‐memory processing programs with various dataset sizes.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Performance enhancement for iterative data computing with in‐memory concurrent processing

Wen

Chen

Chiu

et al. 2019

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…FIUT is another technique for mining frequent itemsets. It is exceptionally productive system for frequent itemset mining(FIM) called as Frequent Itemset Ultrametric Tree (FIUT) [4] [11]. It has two main phases of scans of database.…”

Section: Related Workmentioning

confidence: 99%

An Improved Technique Of Extracting Frequent Itemsets From Massive Data Using MapReduce

Kulkarni¹,

S.R²

2017

IJET

View full text Add to dashboard Cite

Abstract-The mining of frequent itemsets is a basic and essential work in many data mining applications. Frequent itemsets extraction with frequent pattern and rules boosts the applications like Association rule mining, co-relations also in product sale and marketing. In extraction process of frequent itemsets there are number of algorithms used Like FP-growth,E-clat etc. But unfortunately these algorithm are inefficient in distributing and balancing the load, when it come across massive data. Automatic parallelization is also not possible with these algorithms. To defeat these issues of existing algorithms there is need to construct an algorithm which will support the missing features, such as automatically parallelization, balancing and good distribution of data. This paper is focusing on a efficient methodology to extract frequent itemsets with the popular MapReduce approach. This new methodology consist an algorithm which is build using Modified Apriori algorithm,called as Frequent Itemset Mining using Modified Apriori (FIMMA) Technique. This methodology works with three mappers, independently and concurrently by using the decompose strategy. The result of these mappers will be given to the reducers using the hash table method. Reducers gives the top most frequent itemsets.Keyword-Association Rules, Frequent item sets, Load balancing, MapReduce, Modified Apriori, FIMMA. I. INTRODUCTION Frequent itemset mining is a noteworthy research subject in associations, correlations, classification, sequences and other essential data mining tasks. To find out frequent item sets is one of the basic computational task in association rule mining where Frequent Item-set is gathering of similar items that happens together in numerous transactions. In association rule mining, to discover Frequent Itemset , characterizes the two similar itemsets in which first itemsets has similar itemsets of another. These rules are helpful for finding interesting relationships in the datasets and gives insight to the procedure that generated the data [12]. Now a days there are various information creates from different sources like IT enterprises, administrations, advancements and information. These large information is available with different structures. To deal with such excessive information is exceptionally troublesome because it has millions of transactions of users, products etc. There are number of strategies to discover frequent itemsets from database. These techniques function well on usual datasets, however not appropriate on excessive amount of data. To utilize frequent itemset mining strategy on massive database is very critical task. To accelerate the procedure of FIM is complex and indispensable, because FIM consumes vast significant portion of time to do high computation and input/output intensity. One of the solution to this issue is to use a new parallel frequent itemsets mining algorithm with MapReduce called as FIMMA (Frequent Itemsets Mining with Modified Apriori) [12]. In this modern era datasets are very large so only ...

show abstract

“…Unfortunately, pattern mining techniques for large databases, such as FIM, suffer from long processing time (runtime). To reduce the runtime of pattern mining, several optimization techniques have been proposed [2,3]. However, these optimization techniques are incapable of dealing with databases containing a huge number of items, where only few of the relevant patterns are displayed to the end user.…”

Section: Introductionmentioning

confidence: 99%

Highly Efficient Pattern Mining Based on Transaction Decomposition

Djenouri

Lin

Nørvåg

et al. 2019

2019 IEEE 35th International Conference on Data Engineering (ICDE)

View full text Add to dashboard Cite

This paper introduces a highly efficient pattern mining technique called Clustering-Based Pattern Mining (CBPM). This technique discovers relevant patterns by studying the correlation between transactions in transaction databases using clustering techniques. The set of transactions are first clustered using the k-means algorithm, where highly correlated transactions are grouped together. Next, the relevant patterns are derived by applying a pattern mining algorithm to each cluster. We present two different pattern mining algorithms, one approximate and one exact. We demonstrate the efficiency and effectiveness of CBPM through a thorough experimental evaluation.

show abstract

FiDoop-DP: Data Partitioning in Frequent Itemset Mining on Hadoop Clusters

Cited by 63 publications

References 29 publications

Performance enhancement for iterative data computing with in‐memory concurrent processing

Performance enhancement for iterative data computing with in‐memory concurrent processing

An Improved Technique Of Extracting Frequent Itemsets From Massive Data Using MapReduce

Highly Efficient Pattern Mining Based on Transaction Decomposition

Contact Info

Product

Resources

About