Performance optimization of MapReduce-based Apriori algorithm on Hadoop cluster

Singh, Sudhakar; Garg, Rakhi; Mishra, Pragnyaban

doi:10.1016/j.compeleceng.2017.10.008

Cited by 60 publications

(32 citation statements)

References 14 publications

(31 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…As to the two mining algorithms, we can know that the most time-consuming part is to derive fuzzy large itemsets. To deal with this problem, MapReduce-based algorithms can be employed to improve the efficiency [25], [31]. For example, Martín et al presented a generic MapReduce framework for rule discovery [25], and Singh et al proposed a MapReduce-based Apriori algorithm for performance optimization on a Hadoop cluster [31].…”

Section: E Discussionmentioning

confidence: 99%

Post-Analysis Framework for Mining Actionable Patterns Using Clustering and Genetic Algorithms

Chen

Hong

et al. 2019

IEEE Access

View full text Add to dashboard Cite

Mining association rules is an important technique in data analysis. Many approaches for rule analysis have been designed to address different problems. Among them, some works developed from multiobjective genetic algorithms derive a set of Pareto solutions, each of which contains a set of membership functions for fuzzy data mining from quantitative transactions with taxonomy. However, because more than one solution exists in a Pareto set, finding a method to determine the appropriate membership functions and combine them with useful knowledge for mining actionable patterns (such as fuzzy generalized association rules and fuzzy utility itemsets) is a useful research problem. Hence, this paper presents a post-analysisbased genetic-fuzzy mining (PA-GFM) framework for mining actionable patterns that involves two phases: membership-function mining and actionable pattern mining. In the first phase, an existing approach is utilized to derive the Pareto solutions with objective functions. In the second phase, a clustering technique using clustering attributes selected by the users is employed to group the Pareto solutions. The representative solution from each group is then exploited to mine actionable patterns based on the users' requirements. Experiments were conducted on both a simulated dataset and a real one to investigate the performance of the PA-GFM framework. INDEX TERMS Clustering algorithms, domain-driven data mining, fuzzy generalized association rules, fuzzy utility itemsets, multiobjective genetic algorithms.

show abstract

Section: E Discussionmentioning

confidence: 99%

Post-Analysis Framework for Mining Actionable Patterns Using Clustering and Genetic Algorithms

Chen

Hong

et al. 2019

IEEE Access

View full text Add to dashboard Cite

show abstract

“…In other words, if {A} is not frequent, then {AB} is not frequent; if {AB} is frequent, then {A} and {B} are frequent. So, the non-relevant sets are removed early in the search space [43]. In our database, we have about 100,000 real values measured over two years, and at every hour (about 17,280 transactions).…”

Section: Principle Of Anti-monotonymentioning

confidence: 99%

Explainability with Association Rule Learning for Weather Forecast

2021

View full text Add to dashboard Cite

The reliability of the weather forecast models is a complex issue since it depends on numerous parameters and the technical infrastructure which supports them. In doing so, there is a need for advanced works oriented towards a better understanding of these models and the analysis of main associated parameters. Our approach is to study the applicability of the extracted association rules to provide a clearer understanding of atmospheric exchanges. In this work, the proposed methodology is based on the discovery of the interesting interpretable relationships between measured meteorological parameters at the Atmospheric Research Center of Lannemezan (South-West of France). In the preprocessing step, the proposed method is considered to be effectively flexible to account for data uncertainties, unlike the majority of classical evaluation methods mainly directed towards the reduction of variables and data redundancy. In postprocessing, the advantage of our approach is that the extracted rules are a metamodeling of interpretable useful knowledge for the clarity and conciseness of its representation. Moreover, in the processing, the interpretability in data sciences is recent and still in its infancy. The generated association rules with their statistical and semantic interpretations have globally highlighted the possibilities of explicit analysis of meteorological parameters. This study showed that among the generated relevant rules, three parameters (temperature, humidity, wind speed) have a high frequency in the antecedents of the rules and that the only consequence is rain. This is useful for the identification of potential improvements and gaps in the existing models of atmospheric observations, in particular, to understand the related parameterizations to the productivity of the rain phenomenon.

show abstract

“…There is a small class of MapReduce-based Apriori algorithms [17,22,28,36] that are distinct from all of the above. Each aims to improve the performance over the traditional level-wise sequential or parallel Apriori but because they are focused in different aspects of the development (e.g., cloud storage, intelligent search), they have never been compared to realize their common property.…”

Section: Apriori Algorithms: Background and Remarksmentioning

confidence: 99%

On using MapReduce to scale algorithms for Big Data analytics: a case study

2019

View full text Add to dashboard Cite

Scale adds cost. It also adds complexity and can make even the simplest computing infeasible. Many data analytics algorithms are originally designed for in-memory data. When facing with huge volume of data, these algorithms fail to scale due to limitation of processing capacity, storage capacity and operations on a single machine. Thus, to improve scalability and efficiency, parallel and distributed algorithms are developed to

show abstract

Performance optimization of MapReduce-based Apriori algorithm on Hadoop cluster

Cited by 60 publications

References 14 publications

Post-Analysis Framework for Mining Actionable Patterns Using Clustering and Genetic Algorithms

Post-Analysis Framework for Mining Actionable Patterns Using Clustering and Genetic Algorithms

Explainability with Association Rule Learning for Weather Forecast

On using MapReduce to scale algorithms for Big Data analytics: a case study

Contact Info

Product

Resources

About