Study on GSP algorithm based on Hadoop

Li, Huanhuan; Zhou, Xiaofeng; Pan, Chaojun

doi:10.1109/iceiec.2015.7284549

Cited by 7 publications

(5 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This algorithm was called Ha-GSP. 42 Another GSP variant was implemented on Spark by Yu et al, 43 where they introduced two different database partitioning solutions for the unbalanced loading issue. They initially loaded the database from HDFS to Spark RDDs (Resilient Distributed Datasets), and saved the interim results in the RDDs to minimize input-output overhead.…”

Section: Parallelize the Spade Algorithm With The Recursive Dynamic Loadmentioning

confidence: 99%

“…In Hadoop, Li et al presented an algorithm that utilizes divide and conquer capabilities provided in the MapReduce programming model. This algorithm was called Ha‐GSP 42 . Another GSP variant was implemented on Spark by Yu et al, 43 where they introduced two different database partitioning solutions for the unbalanced loading issue.…”

Section: Fundamental Concepts and Literature Reviewmentioning

confidence: 99%

See 1 more Smart Citation

On the big data processing algorithms for finding frequent sequences

Can¹,

Zaval

Uzun‐Per

et al. 2023

Concurrency and Computation

View full text Add to dashboard Cite

Sequential pattern mining algorithms extract trendy sequence appearances inside ordered transactional datasets such as market basket datasets. There is a lack of research employing big data processing techniques to locate frequent sequences on large‐scale datasets. Furthermore, there is a need for optimized sequential pattern mining algorithms that run on ordered one‐dimensional sequences. We also observe a lack of sequential pattern search studies in the literature, where the focus is centered around multi‐dimensional data sequences. Existing approaches that deal with ordered one‐dimensional datasets suffer from scalability issues as the amount of data to be analyzed is enormous. This research investigates the big data processing techniques used to find frequent sequences in large‐scale datasets. It also proposes a scalable sequence pattern mining algorithm called Sequential Pattern Acquisition by Reducing Search Space (SPARSS) designed for distributed data processing systems that efficiently handle large datasets containing sequential one‐element data. It introduces a prototype implementation of SPARSS and provides information on the SPARSS's memory and time requirements, which were calculated as part of experimental studies on a real‐world dataset. The results confirm our expectations and demonstrate SPARSS's superior scalability and run‐time efficiency compared to other distributed algorithms.

show abstract

Section: Parallelize the Spade Algorithm With The Recursive Dynamic Loadmentioning

confidence: 99%

Section: Fundamental Concepts and Literature Reviewmentioning

confidence: 99%

On the big data processing algorithms for finding frequent sequences

Can¹,

Zaval

Uzun‐Per

et al. 2023

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…Sequential Pattern Mining algorithms are also implemented and parallelized on Hadoop and Spark to get rid of the load and serial operation overhead. [20][21][22]48,49 Yu et al proposed a distributed GSP (DGSP) algorithm based on MapReduce on Hadoop. 20 The DGSP algorithm partitions the database and assign jobs to map workers and so optimizes the workload balance.…”

Section: Literature Reviewmentioning

confidence: 99%

“…Li et al proposed Ha-GSP on Hadoop by using Map and Reduce functions of MapReduce. 21 Yu et al implemented the GSP algorithm on Spark. 22 For the imbalance load problem, two different database partitioning strategies were proposed.…”

Section: Literature Reviewmentioning

confidence: 99%

“…So, sequential pattern mining algorithms may also be used to search sequences in a dataset. [9][10][11] As the data grows and the data is stored as big data, several algorithms are proposed on Hadoop 12 and Spark 13 based on Apriori, [14][15][16] frequent pattern growth (FP-Growth), [17][18][19] generalized sequential pattern (GSP), [20][21][22] and PrefixSpan. 22,23 These algorithms run faster and are more scalable than the original versions.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Scalable recommendation systems based on finding similar items and sequences

Uzun‐Per

Gurel²,

Can³

et al. 2022

Concurrency and Computation

View full text Add to dashboard Cite

The rapid growth in the airline industry, which started in 2009, continued until the COVID-19 era, with the annual number of passengers almost doubling in 10 years. This situation has led to increased competition between airline companies, whose profitability has decreased considerably. They aimed to increase their profitability by making services like seat selection, excess baggage, Wi-Fi access optional under the name of ancillary services. To the best of our knowledge, there is no recommendation system for recommending ancillary services for airline companies. Also, to the best of our knowledge, there is no testing framework to compare recommendation algorithms considering their scalabilities and running times. In this paper, we propose a framework based on Lambda architecture for recommendation systems that run on a big data processing platform. The proposed method utilizes association rule and sequential pattern mining algorithms that are designed for big data processing platforms. To facilitate testing of the proposed method, we implement a prototype application. We conduct an experimental study on the prototype to investigate the performance of the proposed methodology using accuracy, scalability, and latency related performance metrics. The results indicate that the proposed method proves to be useful and has negligible processing overheads.

show abstract