A non-group parallel frequent pattern mining algorithm based on conditional patterns

Kuang, Zhejun; Zhou, Hang; Zhou, Dongdai; Zhou, Jingfang; Yang, Kun

doi:10.1631/fitee.1800467

Cited by 5 publications

(5 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Figures 5 present the runtime of DT-DPM and both baseline MapReduce based models using big databases for solving both Itemset and Sequential Pattern Mining problems. The Table 2 Speedup of pattern mining algorithms with and without using DT-DPM framework using different mappers (2,4,8,16,32) Problem Database Without DT-DPM With DT-DPM Mappers 2 4 8 1 6 3 2 2 4 8 1 6 3 2 p u m s b 2 3 7 1 1 3 4 5 9 1 5 1 9 3 5 m u s h r o o m 3 8 1 0 1 3 3 6 6 1 1 1 9 2 2 3 baseline methods used is FiDoop-DP [15], and NG-PFP: NonGroup Parallel Frequent Pattern mining [67] for itemset mining, and PrefixSpan-S [66] for sequence mining. The results reveal that our model outperforms the baseline MapReduce based models in terms of computational time for both itemset and sequence mining.…”

Section: Results On Big Databasesmentioning

confidence: 99%

“…However, it is very sensitive to the data distribution. Kuang et al [67] proposed the parallel implementation of FP-Growth algorithm in Hadoop by removing the data redundancy between the different data partitions, which allows to handle the transactions in a single pass. Sumalatha et al [68] introduces the concept of distributed temporal high utility sequential patterns, and propose an intelligent strategy by creating a time interval utility data structure for evaluating the candidate patterns.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

A general-purpose distributed pattern mining system

et al. 2020

View full text Add to dashboard Cite

This paper explores five pattern mining problems and proposes a new distributed framework called DT-DPM: Decomposition Transaction for Distributed Pattern Mining. DT-DPM addresses the limitations of the existing pattern mining problems by reducing the enumeration search space. Thus, it derives the relevant patterns by studying the different correlation among the transactions. It first decomposes the set of transactions into several clusters of different sizes, and then explores heterogeneous architectures, including MapReduce, single CPU, and multi CPU, based on the densities of each subset of transactions. To evaluate the DT-DPM framework, extensive experiments were carried out by solving five pattern mining problems (FIM: Frequent Itemset Mining, WIM: Weighted Itemset Mining, UIM: Uncertain Itemset Mining, HUIM: High Utility Itemset Mining, and SPM: Sequential Pattern Mining). Experimental results reveal that by using DT-DPM, the scalability of the pattern mining algorithms was improved on large databases. Results also reveal that DT-DPM outperforms the baseline parallel pattern mining algorithms on big databases.

show abstract

Section: Results On Big Databasesmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

A general-purpose distributed pattern mining system

et al. 2020

View full text Add to dashboard Cite

show abstract

“…However, this involved multiple splits of transactions to be generated increasing the data to be transferred during shuffling. In this regard, NG‐PFP 30 eliminated the formation of

G

‐

list

from

F

‐

list

as a means of reducing the shuffling cost to improve the efficiency.…”

Section: Related Workmentioning

confidence: 99%

Mining high utility itemsets with time‐aware scheduling using Apache Spark

Brahmavar

Venkatarama

Maiya

2022

Concurrency and Computation

View full text Add to dashboard Cite

Summary Since the last decade, Market Basket Analysis has been propelled by augmentation of revenue information. Termed as high utility itemset mining (HUIM), this task considers the factors of purchase quantity and unit profit of the items in the transaction database. Although several sequential algorithms to mine HUIs exist, their performance degrades as the database becomes voluminous. Distributed computing solutions such as Apache Hadoop and Apache Spark have proven effective in alleviating this bottleneck. In this regard, the current study develops a parallel workflow to adapt a single‐phase tree‐based algorithm called the single phase utility computation (SPUC) algorithm on a Spark cluster. Based on the time taken to mine individual conditional pattern bases in SPUC, an assignment strategy that partitions the search space across the cluster is proposed in parallel SPUC (PSPUC) algorithm. Experimental evaluation conducted using real and synthetic datasets demonstrate that PSPUC outperforms PHUI‐Growth algorithm. Apart from this, PSPUC in conjunction with the time‐aware assignment strategy converges mining faster than a random assignment of items. A linear speedup of PSPUC is also demonstrated.

show abstract

“…An incremental utility-based pattern mining algorithm was put forth here. To extract data in big datasets a Binary based Technique was designed [4]. Threads were collaborated to generate frequent itemsets in a big data environment.…”

Section: Introductionmentioning

confidence: 99%

A genetic algorithm coupled with tree-based pruning for mining closed association rules

Poovan

Acharya

Reddy

2023

IJECE

View full text Add to dashboard Cite

Due to the voluminous amount of itemsets that are generated, the association rules extracted from these itemsets contain redundancy, and designing an effective approach to address this issue is of paramount importance. Although multiple algorithms were proposed in recent years for mining closed association rules most of them underperform in terms of run time or memory. Another issue that remains challenging is the nature of the dataset. While some of the existing algorithms perform well on dense datasets others perform well on sparse datasets. This paper aims to handle these drawbacks by using a genetic algorithm for mining closed association rules. Recent studies have shown that genetic algorithms perform better than conventional algorithms due to their bitwise operations of crossover and mutation. Bitwise operations are predominantly faster than conventional approaches and bits consume lesser memory thereby improving the overall performance of the algorithm. To address the redundancy in the mined association rules a tree-based pruning algorithm has been designed here. This works on the principle of minimal antecedent and maximal consequent. Experiments have shown that the proposed approach works well on both dense and sparse datasets while surpassing existing techniques with regard to run time and memory.

show abstract

A non-group parallel frequent pattern mining algorithm based on conditional patterns

Cited by 5 publications

References 38 publications

A general-purpose distributed pattern mining system

A general-purpose distributed pattern mining system

Mining high utility itemsets with time‐aware scheduling using Apache Spark

A genetic algorithm coupled with tree-based pruning for mining closed association rules

Contact Info

Product

Resources

About