A comparative analysis of state-of-the-art SQL-on-Hadoop systems for interactive analytics

Tapdiya, Ashish; Fabbri, Daniel

doi:10.1109/bigdata.2017.8258066

Cited by 4 publications

References 13 publications

(21 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

On the performance of SQL scalable systems on Kubernetes: a comparative study

et al. 2022

View full text Add to dashboard Cite

The popularization of Hadoop as the the-facto standard platform for data analytics in the context of Big Data applications has led to the upsurge of SQL-on-Hadoop systems, which provide scalable query execution engines allowing the use of SQL queries on data stored in HDFS. In this context, Kubernetes appears as the leading choice to simplify the deployment and scaling of containerized applications; however, there is a lack of studies about the performance of SQL-on-Hadoop systems deployed on Kubernetes, and this is the gap we intend to fill in this paper. We present an experimental study involving four representative SQL scalable platforms: Apache Drill, Apache Hive, Apache Spark SQL and Trino. Concretely, we analyze the performance of these systems when they are deployed on a Hadoop cluster with Kubernetes by using the TPC-H benchmark. The results of our study can help practitioners and users about what they can expect in terms of performance if they plan to use the advantages of Kubernetes to deploy applications using the analyzed SQL scalable platforms.

show abstract

On the performance of SQL scalable systems on Kubernetes: a comparative study

et al. 2022

View full text Add to dashboard Cite

show abstract

Importance of Data Distribution on Hive-Based Systems for Query Performance: An Experimental Study

Ciritoglu

Murphy

Thorpe

2020

2020 IEEE International Conference on Big Data and Smart Computing (BigComp)

View full text Add to dashboard Cite

A Review on Big Data Optimization Techniques

Nerić

Sarajlić

2020

B&H Electrical Engineering

View full text Add to dashboard Cite

Analysis of representative tools for SQL query processing on Hadoop (SQL-on-Hadoop systems), such as Hive, Impala, Presto, Shark, show that they are not still sufficiently efficient for complex analytical queries and interactive query processing. Existing SQL-on-Hadoop systems have many benefits from the application of modern query processing techniques that have been studied extensively for many years in the database community. It is expected that with the application of advanced techniques, the performance of SQL-on-Hadoop systems can be improved. The main idea of this paper is to give a review of big data concepts and technologies, and summarize big data optimization techniques that can be used for improving performance when processing big data.

show abstract

A comparative analysis of state-of-the-art SQL-on-Hadoop systems for interactive analytics

Cited by 4 publications

References 13 publications

On the performance of SQL scalable systems on Kubernetes: a comparative study

On the performance of SQL scalable systems on Kubernetes: a comparative study

Importance of Data Distribution on Hive-Based Systems for Query Performance: An Experimental Study

A Review on Big Data Optimization Techniques

Contact Info

Product

Resources

About