2022
DOI: 10.1109/access.2021.3137789
|View full text |Cite
|
Sign up to set email alerts
|

Towards Enhancing the Performance of Parallel FP-Growth on Spark

Abstract: Frequent itemset mining (FIM) is a crucial tool for identifying hidden patterns in information. FP-Growth is an FIM algorithm used to find associations. When the data size increases, the execution of FIM algorithms on a single machine suffers from computational problems, such as memory and time consumption. For these reasons, parallel and distributed processing on platforms such as Spark is essential. The parallel frequent pattern (PFP) is the implementation of FP-Growth in Spark. The main problem with PFP is … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 15 publications
0
1
0
Order By: Relevance
“…An association rule can be expressed as , which denotes that Y is much more likely to occur whenever X occurs. The classic algorithms for association rule mining include Apriori [30] and FP-growth [31] , but they cannot be directly used to analyze the data in this study because the algorithms deal with binary data (0 and 1). Therefore, firstly, the original data of EMF, population density and building density are classified by two-step clustering method; then the classified data are dualized to generate a transaction matrix that can be processed by the association rule mining algorithm.…”
Section: Association Analysis Methodsmentioning
confidence: 99%
“…An association rule can be expressed as , which denotes that Y is much more likely to occur whenever X occurs. The classic algorithms for association rule mining include Apriori [30] and FP-growth [31] , but they cannot be directly used to analyze the data in this study because the algorithms deal with binary data (0 and 1). Therefore, firstly, the original data of EMF, population density and building density are classified by two-step clustering method; then the classified data are dualized to generate a transaction matrix that can be processed by the association rule mining algorithm.…”
Section: Association Analysis Methodsmentioning
confidence: 99%
“…Little files for the most part allude to those file sizes, which are under 64 MB. As per a concentrate in 2007 at the National Energy Research Scientific Computing Center, 43% of the north of 13 million files on a common parallel file system are under 64 KB and close to 100% are under 64 MB (Petascale Data Storage Institute (2007)), and more logical applications consist of an enormous number of little files are portrayed in (Carns et al [7], Essam et.al [8]). In any case, notwithstanding monstrous little file datasets, the constructed FP-tree in Parallel FP-Growth (PFP) algorithm can't squeeze into the memory, which frequently creates some issues, for example, memory overflow and enormous communication above.…”
Section: Introductionmentioning
confidence: 99%