2017
DOI: 10.1007/s11227-017-1963-4
|View full text |Cite
|
Sign up to set email alerts
|

HFIM: a Spark-based hybrid frequent itemset mining algorithm for big data processing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
18
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 55 publications
(19 citation statements)
references
References 21 publications
1
18
0
Order By: Relevance
“…The analysis results from each reducer were further aggregated to generate association rules for the entire dataset. In (13) the authors used Hybrid Frequent Itemset Mining (HFIM) technique in Spark to optimize the execution time. HFIM uses vertical and horizontal layout of the dataset to find the association.…”
Section: Background Studymentioning
confidence: 99%
“…The analysis results from each reducer were further aggregated to generate association rules for the entire dataset. In (13) the authors used Hybrid Frequent Itemset Mining (HFIM) technique in Spark to optimize the execution time. HFIM uses vertical and horizontal layout of the dataset to find the association.…”
Section: Background Studymentioning
confidence: 99%
“…DFIMA (Distributed Frequent Itemset Mining Algorithm) [16] is a Spark-based Apriori algorithm that uses a Boolean vector for the frequent items and a matrix-based pruning method to reduce the size of candidates. HFIM (Hybrid Frequent Itemset Mining) [17] is also an Apriori-based algorithm along with vertical format of the dataset that reduces the scanning of the dataset. It uses both horizontal and vertical dataset obtained by eliminating infrequent items, where horizontal dataset is distributed across the worker nodes and vertical dataset is shared.…”
Section: Related Workmentioning
confidence: 99%
“…So, a number of data mining and machine learning algorithms have been re-designed on the Spark RDD framework. FIM algorithms on the Spark have been also proposed by many authors [12][13][14][15][16][17][18], where most of the efforts have been made on the efficient implementations of Apriori-based FIM algorithm on the Spark. The efficiency of the Spark-based Apriori algorithms extensively depend on the way it is parallelized on the Spark, and the underlying data structures used to store and compute frequent itemsets.…”
Section: Introductionmentioning
confidence: 99%
“…HFIM algorithm [29] is another Spark-based implementation of the Apriori algorithm for various data sets, which uses the vertical layout of the data set to solve the problem of scanning the dataset in each iteration. It is implemented on the Spark framework, integrating the concept of resilient distributed datasets and in-memory processing to optimize the processing time of the operation.…”
Section: Related Workmentioning
confidence: 99%
“…In this section, the DisPrePost algorithm has been compared to two advanced algorithms, HPrePostPlus [26] and the well-known HFIM [29]. DisPrePost is the first implementation of the PrePost algorithm in the Spark framework, HPrePostPlus is a recent implementation of the Hadoop-based PrePost parallel algorithm [26] with good results, and HFIM is a typical implementation of the Sparkbased Apriori parallel algorithm [29] with good performance. We evaluated speed performance by analyzing runtime and scalability.…”
Section: Performance Evaluationmentioning
confidence: 99%