2014
DOI: 10.1007/978-3-319-13021-7_12
|View full text |Cite
|
Sign up to set email alerts
|

A Study of SQL-on-Hadoop Systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 27 publications
(9 citation statements)
references
References 10 publications
0
9
0
Order By: Relevance
“…Similarly, Chen et al compare multiple SQL‐on‐Hadoop engines using modified TPC‐DS queries on clusters with varying number of nodes. In terms of storage formats, they use the default ORC and Parquet configuration parameters.…”
Section: Background and Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Similarly, Chen et al compare multiple SQL‐on‐Hadoop engines using modified TPC‐DS queries on clusters with varying number of nodes. In terms of storage formats, they use the default ORC and Parquet configuration parameters.…”
Section: Background and Related Workmentioning
confidence: 99%
“…For example, ORC is favored by Hive and Presto, whereas Parquet is first choice for SparkSQL and Impala . A number of studies have investigated and compared the performance of file formats running them on different SQL‐on‐Hadoop engines. However, because of the different internal engine architectures, these works actually compare the engine together with its file format optimizations.…”
Section: Introductionmentioning
confidence: 99%
“…As seen previously in the literature, one important feature of data is the type of file formats, stating the way in which data is stored (Li & Zhou, ). For this benchmark, the recommendations of the Stinger initiative (Chen et al, ) were followed, storing data in the ORC format and using Tez as the execution engine when evaluating Hive. ORC stands for optimized row columnar, optimizing data storage when compared with other file formats.…”
Section: Experimental Evaluationmentioning
confidence: 99%
“…Since then, many database benchmarks have been proposed by academia and industry for various evaluation goals, such as TPC-C [25] for RDBMSs, TPC-DI [21] for data integration; OO7 benchmark [2] for object-oriented DBMSs, and XML benchmark systems [15,23] for XML DBMSs. More recently, the NoSQL and big data movement in the late 2000s brought the arrival of the next generation of benchmarks, such as YCSB benchmark [4] for cloud serving systems, LDBC [6] for Graph and RDF DBMSs, BigBench [3,10] for big data systems. However, those general-purpose or micro benchmarks are not designed for MMDBs.…”
Section: Introductionmentioning
confidence: 99%