RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework

Singh, Pankaj; Singh, Sudhakar; Mishra, Pragnyaban; Garg, Rakhi

doi:10.1007/978-3-030-37051-0_85

Cited by 8 publications

(6 citation statements)

References 27 publications

(62 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…RDD-Eclat [36] is a parallel Eclat algorithm entitled RDD-Eclat and the implementation of its five variations on the Spark RDD framework. EclatV1 is the first version, while the others are EclatV2, EclatV3, EclatV4, and EclatV5.…”

Section: Vertical Layout-based Algorithmsmentioning

confidence: 99%

An Efficient Spark-Based Hybrid Frequent Itemset Mining Algorithm for Big Data

Al-Bana

Salah

Othman

2022

Data

View full text Add to dashboard Cite

Frequent itemset mining (FIM) is a common approach for discovering hidden frequent patterns from transactional databases used in prediction, association rules, classification, etc. Apriori is an FIM elementary algorithm with iterative nature used to find the frequent itemsets. Apriori is used to scan the dataset multiple times to generate big frequent itemsets with different cardinalities. Apriori performance descends when data gets bigger due to the multiple dataset scan to extract the frequent itemsets. Eclat is a scalable version of the Apriori algorithm that utilizes a vertical layout. The vertical layout has many advantages; it helps to solve the problem of multiple datasets scanning and has information that helps to find each itemset support. In a vertical layout, itemset support can be achieved by intersecting transaction ids (tidset/tids) and pruning irrelevant itemsets. However, when tids become too big for memory, it affects algorithms efficiency. In this paper, we introduce SHFIM (spark-based hybrid frequent itemset mining), which is a three-phase algorithm that utilizes both horizontal and vertical layout diffset instead of tidset to keep track of the differences between transaction ids rather than the intersections. Moreover, some improvements are developed to decrease the number of candidate itemsets. SHFIM is implemented and tested over the Spark framework, which utilizes the RDD (resilient distributed datasets) concept and in-memory processing that tackles MapReduce framework problem. We compared the SHFIM performance with Spark-based Eclat and dEclat algorithms for the four benchmark datasets. Experimental results proved that SHFIM outperforms Eclat and dEclat Spark-based algorithms in both dense and sparse datasets in terms of execution time.

show abstract

Section: Vertical Layout-based Algorithmsmentioning

confidence: 99%

An Efficient Spark-Based Hybrid Frequent Itemset Mining Algorithm for Big Data

Al-Bana

Salah

Othman

2022

Data

View full text Add to dashboard Cite

show abstract

“…The rule search space is effectively divided into subspace sets through concept lattice and equivalence relationships. The support calculation of each itemset does not require repeated retrieval of the entire dataset [16][17][18][19]. The main idea of using Eclat framework to study learning behaviors need the support of big data set of learning behaviors, through data transposition and standardization processing, we can get the itemsets and the transaction set.…”

Section: Frequent Itemsets Mining Based On Eclat Frameworkmentioning

confidence: 99%

Improved Probabilistic Frequent Itemset Analysis Strategy of Learning Behaviors Based on Eclat Framework

Xia¹

2022

Advances in Decision Making

View full text Add to dashboard Cite

Interactive learning environment is the key support for education decision making, the corresponding analytics and methodology are the important part of educational technology research and development. As an important part and the research challenge, learning behaviors are uncertain and produce complex data relationships, which makes the learning analysis process more difficult. This chapter studies the feasibility of Eclat framework applying in educational decision making and get the corresponding the data analysis results. We take probabilistic frequent itemsets and association rules as research objectives, extract and standardize multiple data subsets; Based on Eclat framework, using data vertical format, we design and improve the models and algorithms in the process of data management and processing. The results show that the improved models and algorithms are effective and feasible. On the premise of ensuring robustness and stability, the mining quality of probabilistic frequent itemsets and association rules is guaranteed, which is conducive to the construction of key execution topology of learning behaviors, and improves the accuracy and reliability of data association analysis and decision prediction. The whole analysis methods and demonstration processes can provide references for the study of interactive learning environment, as well as decision suggestions and predictive feedback.

show abstract

“…horizontal database record or breadth first searching [6] and vertical database record [7][8] or depth first searching. When the horizontal record drawback issues are subjected to storage and memory, thus contemporary works are then utilized on the vertical database for rules mining algorithms that are proposed in [8][9][10]. In ARM, the so-called state-of-the-art frequent/infrequent models are Apriori [1,6] underlying on horizontal records.…”

Section: Related Workmentioning

confidence: 99%

“…To the best of our knowledge, Equivalent Class Transformation (Eclat) algorithm [8] outperforms because of its 'fast' intersection of its transaction-id-list to determine the minimum or maximum support threshold [9,14]. The Eclat followers and the invariants are [9][10][11][12][13], [15][16][17][18][19][20], [22] and [26].…”

Section: Related Workmentioning

confidence: 99%

CRS-iEclat: Implementation of Critical Relative Support in iEclat Model for Rare Pattern Mining

Bakar¹,

Man²,

Abdullah³

et al. 2021

IJACSA

View full text Add to dashboard Cite

The research purpose is to develop a performance enhancement in Incremental Eclat (iEclat) model by embedding Critical Relative Support (CRS) in mining of infrequent itemset. The CRS measure acts as an interestingness measure (filter) in iEclat model that comprises of i-Eclat-diffset algorithm, i-Eclatsortdiffset algorithm and i-Eclat-postdiffset algorithm for infrequent (rare) itemset mining. The association rule is performed to reveal the relationships among itemsets in a transactional database. The task of association rule mining is to discover if there exist the frequent itemset or infrequent patterns in the database and if any, an interesting relationship between these frequent or infrequent itemsets can reveal a new pattern analysis for the future decision making. Regardless of frequent or infrequent itemsets, the persisting issues are deemed to execution time to display the rules and the highest memory consumption during mining process. CRS-iEclat engine is proposed to overcome the said issues. Prior to experimentation, results indicate that CRS-iEclat outperforms iEclat from 54% to 100% accuracy on execution time (ET) in selected database as to show the improvement of ET efficiency.

show abstract

RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework

Cited by 8 publications

References 27 publications

An Efficient Spark-Based Hybrid Frequent Itemset Mining Algorithm for Big Data

An Efficient Spark-Based Hybrid Frequent Itemset Mining Algorithm for Big Data

Improved Probabilistic Frequent Itemset Analysis Strategy of Learning Behaviors Based on Eclat Framework

CRS-iEclat: Implementation of Critical Relative Support in iEclat Model for Rare Pattern Mining

Contact Info

Product

Resources

About