2019
DOI: 10.1007/s41870-019-00337-3
|View full text |Cite
|
Sign up to set email alerts
|

A data structure perspective to the RDD-based Apriori algorithm on Spark

Abstract: During the recent years, a number of efficient and scalable frequent itemset mining algorithms for big data analytics have been proposed by many researchers. Initially, MapReduce-based frequent itemset mining algorithms on Hadoop cluster were proposed. Although, Hadoop has been developed as a cluster computing system for handling and processing big data, but the performance of Hadoop does not meet the expectation for the iterative algorithms of data mining, due to its high I/O, and writing and then reading int… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 25 publications
0
6
0
Order By: Relevance
“…Faced with high-dimensional mass big data, the exact algorithm itself is almost of no practicability due to the temporal complexity and explosion of storage space. However, some calculation platforms that can realize the temporal and spatial decomposition of data mining tasks have emerged in order to process big data, so the exact algorithm becomes feasible [12,[27][28][29]. The advantage of these calculation platforms lies in the fact that a big data analysis becomes feasible due to computer clusters, among which Spark reaches the highest rate at present.…”
Section: Introductionmentioning
confidence: 99%
“…Faced with high-dimensional mass big data, the exact algorithm itself is almost of no practicability due to the temporal complexity and explosion of storage space. However, some calculation platforms that can realize the temporal and spatial decomposition of data mining tasks have emerged in order to process big data, so the exact algorithm becomes feasible [12,[27][28][29]. The advantage of these calculation platforms lies in the fact that a big data analysis becomes feasible due to computer clusters, among which Spark reaches the highest rate at present.…”
Section: Introductionmentioning
confidence: 99%
“…Of course, the efficiency of association analysis of Apriori algorithm has been greatly improved after improvement. There is also an iconic association rule algorithm, which is called FP-Growth algorithm [3] . Compared with the original association rule algorithm, FP growth algorithm has greatly shortened the association analysis time of the algorithm.…”
Section: Introductionmentioning
confidence: 99%
“…From another perspective, it is common to use multi-thread and computing power with multicore architecture [12] in supporting data processing. This raises the suspicion that multi-thread in [13] and [14] can produce better performance also in getting this frequent itemset, as well as a challenge on how to determine the best performance process architecture that is applied to which subprocesses as threads and how big is the increase for the best multi-thread architecture in a single server environment, where an experiment in multi-node server environment was proposed by [15], [16].…”
Section: Introductionmentioning
confidence: 99%