A data structure perspective to the RDD-based Apriori algorithm on Spark

Singh, Pankaj; Singh, Sudhakar; Mishra, Pragnyaban; Garg, Rakhi

doi:10.1007/s41870-019-00337-3

Cited by 8 publications

(6 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Faced with high-dimensional mass big data, the exact algorithm itself is almost of no practicability due to the temporal complexity and explosion of storage space. However, some calculation platforms that can realize the temporal and spatial decomposition of data mining tasks have emerged in order to process big data, so the exact algorithm becomes feasible [12,[27][28][29]. The advantage of these calculation platforms lies in the fact that a big data analysis becomes feasible due to computer clusters, among which Spark reaches the highest rate at present.…”

Section: Introductionmentioning

confidence: 99%

Right-Hand Side Expanding Algorithm for Maximal Frequent Itemset Mining

Zhang

Zhu

et al. 2021

Applied Sciences

View full text Add to dashboard Cite

When it comes to association rule mining, all frequent itemsets are first found, and then the confidence level of association rules is calculated through the support degree of frequent itemsets. As all non-empty subsets in frequent itemsets are still frequent itemsets, all frequent itemsets can be acquired only by finding all maximal frequent itemsets (MFIs), whose supersets are not frequent itemsets. In this study, an algorithm, named right-hand side expanding (RHSE), which can accurately find all MFIs, was proposed. First, an Expanding Operation was designed, which, starting from any given frequent itemset, could add items using certain rules and form some supersets of given frequent itemsets. In addition, these supersets were all MFIs. Next, this operator was used to add items by taking all frequent 1-itemsets as the starting point alternately, and all MFIs were found in the end. Due to the special design of the Expanding Operation, each MFI could be found. Moreover, the path found was unique, which avoided the algorithm redundancy in temporal and spatial complexity. This algorithm, which has a high operating rate, is applicable to the big data of high-dimensional mass transactions as it is capable of avoiding the computing redundancy and finding all MFIs. In the end, a detailed experimental report on 10 open standard transaction sets was given in this study, including the big data calculation results of million-class transactions.

show abstract

Section: Introductionmentioning

confidence: 99%

Right-Hand Side Expanding Algorithm for Maximal Frequent Itemset Mining

Zhang

Zhu

et al. 2021

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…Of course, the efficiency of association analysis of Apriori algorithm has been greatly improved after improvement. There is also an iconic association rule algorithm, which is called FP-Growth algorithm [3] . Compared with the original association rule algorithm, FP growth algorithm has greatly shortened the association analysis time of the algorithm.…”

Section: Introductionmentioning

confidence: 99%

Research on Apriori algorithm based on compression processing and hash table

Yu¹,

Zhang²

2023

Third International Conference on Machine Learning and Computer Application (ICMLCA 2022)

View full text Add to dashboard Cite

“…From another perspective, it is common to use multi-thread and computing power with multicore architecture [12] in supporting data processing. This raises the suspicion that multi-thread in [13] and [14] can produce better performance also in getting this frequent itemset, as well as a challenge on how to determine the best performance process architecture that is applied to which subprocesses as threads and how big is the increase for the best multi-thread architecture in a single server environment, where an experiment in multi-node server environment was proposed by [15], [16].…”

Section: Introductionmentioning

confidence: 99%

Analysis of frequent itemset generation based on trie data structure in Apriori algorithm

Hodijah

Setijohatmo

2021

TELKOMNIKA

View full text Add to dashboard Cite

Apriori is one technique of data mining association rules that aims to extract correlations between sets of items in the transaction database. The main problem with the Apriori algorithm is the process of scanning databases repeatedly to generate itemset candidates. This research examines the combination of pruning by using the trieapproach and multi-thread implementation in three algorithms to obtain frequent itemset. Trie is a data structure in the form of an ordered tree to store a set of strings where every node in the tree contains the same prefix. The use of a full combination trie (different from frequent pattern (FP) tree using links) allows the implementation of arrays and the hash calculation to achieve the addressing of itemset combination. In this research, the measure to get the address is called Hash-node calculation used to update support value. For these three alternatives, run time processing is analyzed based on the number of itemset combinations and transaction data at a certain minimum support value. The experimental results show that an algorithm thatexploits resource capabilities by applying multi-threadperforms almost seven times betterthanan algorithm implemented in single-thread in calculating hash-node. The fastest run time of the multi-thread approach is 43 minutes with 150-itemset combinations on 100,000 transaction data.

show abstract

A data structure perspective to the RDD-based Apriori algorithm on Spark

Cited by 8 publications

References 25 publications

Right-Hand Side Expanding Algorithm for Maximal Frequent Itemset Mining

Right-Hand Side Expanding Algorithm for Maximal Frequent Itemset Mining

Research on Apriori algorithm based on compression processing and hash table

Analysis of frequent itemset generation based on trie data structure in Apriori algorithm

Contact Info

Product

Resources

About