A technique for parallel query optimization using MapReduce framework and a semantic-based clustering method

Azhir, Elham; Navimipour, Nima Jafari; Hosseinzadeh, Mehdi; Sharifi, Arash; Darwesh, Aso Mohammad

doi:10.7717/peerj-cs.580

“…The term frequency (TF) method [17] and cosine measure with a feature representation of SQL query language are used in the presented access plan recommendation method. In the present article, a parallel MapReduce model is applied to sped up the query clustering operation in Apache Hadoop [18]. Furthermore, the performance of the presented access plan recommendation method [18] is improved using the implementation in Apache Spark, which is a in-memory distributed data processing engine.…”

Section: Introductionmentioning

confidence: 99%

“…In the present article, a parallel MapReduce model is applied to sped up the query clustering operation in Apache Hadoop [18]. Furthermore, the performance of the presented access plan recommendation method [18] is improved using the implementation in Apache Spark, which is a in-memory distributed data processing engine. The following list underlines the article's key contributions:…”

Section: Introductionmentioning

confidence: 99%

Performance Evaluation of Query Plan Recommendation with Apache Hadoop and Apache Spark

Azhir

¹

,

Hosseinzadeh

²

,

Khan

³

et al. 2022

Mathematics

Self Cite

5

0

View full text Add to dashboard Cite

Access plan recommendation is a query optimization approach that executes new queries using prior created query execution plans (QEPs). The query optimizer divides the query space into clusters in the mentioned method. However, traditional clustering algorithms take a significant amount of execution time for clustering such large datasets. The MapReduce distributed computing model provides efficient solutions for storing and processing vast quantities of data. Apache Spark and Apache Hadoop frameworks are used in the present investigation to cluster different sizes of query datasets in the MapReduce-based access plan recommendation method. The performance evaluation is performed based on execution time. The results of the experiments demonstrated the effectiveness of parallel query clustering in achieving high scalability. Furthermore, Apache Spark achieved better performance than Apache Hadoop, reaching an average speedup of 2x.

show abstract

Visual Dynamic Simulation Model of Unstructured Data in Social Networks

Zhang

¹

2022

Security and Communication Networks

View full text Add to dashboard Cite

Social networks contain a large amount of unstructured data. To ensure the stability of unstructured big data, this study proposes a method for visual dynamic simulation model of unstructured data in social networks. This study uses the Hadoop platform and data visualization technology to establish a univariate linear regression model according to the time correlation between data, estimates and approximates perceptual data, and collects unstructured data of social networks. Then, the unstructured data collected from the original social network are processed, and an adaptive threshold is designed to filter out the influence of noise. The unstructured data of social network after feature analysis are processed to extract its visual features. Finally, this study carries out the Hadoop cluster design, implements data persistence by HDFS, uses MapReduce to extract data clusters for distributed computing, builds a visual dynamic simulation model of unstructured data in social network, and realizes the display of unstructured data in social network. The experimental results show that this method has a good visualization effect on unstructured data in social networks and can effectively improve the stability and efficiency of unstructured data visualization in social networks.

show abstract

Performance Evaluation of Query Plan Recommendation with Apache Hadoop and Apache Spark

Azhir¹,

Hosseinzadeh²,

Khan³

et al. 2022

Preprint

Self Cite

5

0

View full text Add to dashboard Cite

Access plan recommendation is a query optimization approach that executes new queries using prior created query execution plans (QEPs). The query optimizer divides the query space into clusters in the mentioned method. However, traditional clustering algorithms take a significant amount of execution time for clustering such large datasets. The MapReduce distributed computing model provides efficient solutions for storing and processing vast quantities of data. Apache Spark and Apache Hadoop frameworks are used in the present investigation to cluster different sizes of query datasets in the MapReduce-based access plan recommendation method. The performance evaluation is performed based on execution time. The results of the experiments demonstrated the effectiveness of parallel query clustering in achieving high scalability. Furthermore, Apache Spark achieved better performance than Apache Hadoop, reaching an average speedup of 2x.

show abstract

A technique for parallel query optimization using MapReduce framework and a semantic-based clustering method

Cited by 3 publications

References 29 publications

Performance Evaluation of Query Plan Recommendation with Apache Hadoop and Apache Spark

Performance Evaluation of Query Plan Recommendation with Apache Hadoop and Apache Spark

Visual Dynamic Simulation Model of Unstructured Data in Social Networks

Performance Evaluation of Query Plan Recommendation with Apache Hadoop and Apache Spark

Contact Info

Product

Resources

About