2018
DOI: 10.1002/widm.1297
|View full text |Cite
|
Sign up to set email alerts
|

Big data processing tools: An experimental performance evaluation

Abstract: Big Data is currently a hot topic of research and development across several business areas mainly due to recent innovations in information and communication technologies. One of the main challenges of Big Data relates to how one should efficiently handle massive volumes of complex data. Due to the notorious complexity of the data that can be collected from multiple sources, usually motivated by increasing data volumes gathered at high velocity, efficient processing mechanisms are needed for data analysis purp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
20
0
3

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 18 publications
(23 citation statements)
references
References 20 publications
0
20
0
3
Order By: Relevance
“…The work of [9] benchmarks different SQLon-Hadoop systems (Hive, Spark, Presto and Drill) using the Star Schema Benchmark (SSB), also used in [10], testing Hive and Presto using different partitioning and bucketing strategies. In [6], Drill, HAWQ, Hive, Impala, Presto and Spark were benchmarked showing the advantages of in-memory processing tools like HAWQ, Impala and Presto. Although the good performance of these in-memory processing tools, this work also shows the increase in the processing time that is verified when these tools do not have enough RAM memory and activate the "Spill to Disk" functionality, making use of secondary memory.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The work of [9] benchmarks different SQLon-Hadoop systems (Hive, Spark, Presto and Drill) using the Star Schema Benchmark (SSB), also used in [10], testing Hive and Presto using different partitioning and bucketing strategies. In [6], Drill, HAWQ, Hive, Impala, Presto and Spark were benchmarked showing the advantages of in-memory processing tools like HAWQ, Impala and Presto. Although the good performance of these in-memory processing tools, this work also shows the increase in the processing time that is verified when these tools do not have enough RAM memory and activate the "Spill to Disk" functionality, making use of secondary memory.…”
Section: Related Workmentioning
confidence: 99%
“…The Big Data concept also impacts the traditional Data Warehouse (DW), leading to a Big Data Warehouse (BDW) with the same goals in terms of data integration and decision-making support, but addressing Big Data characteristics [4], [5] such as massively parallel processing; mixed and complex analytical workloads (e.g., ad hoc querying, data mining, text mining, exploratory analysis and materialized views); flexible storage to support data from several sources or real-time operations (stream processing, low latency and high frequency updates), only to mention a few. Also, SQL-on-Hadoop systems are increasing their notoriety, looking for interactive and low latency query executions, providing timely analytics to support the decision-making process, in which each second counts [6]. Aligned with the research trends of supporting OLAP (Online Analytical Processing) workloads and aggregations over Big Data [7], this paper compares Apache Druid, which promises fast aggregations on Big Data environments [8], with two well-known SQL-on-Hadoop systems, Hive and Presto.…”
Section: Introductionmentioning
confidence: 99%
“…The goal of this fourth phase is to select the most suitable technology for non-raw data repositories that provide analytical capabilities (REP). Below, we specify the requirements from Table 1 that have more importance [5], [30], [31] in the selection of the technology for the analytical repository, also called Big Data Warehouse [16]:…”
Section: F Phase 4 Analytical Repositories Implementationmentioning
confidence: 99%
“…Considering that the Hive has original support for ORC file and ORC file format has good encodings, compression algorithms, and multi-dimensional statistical information [29], the data in this paper is stored in ORC file format.…”
Section: Hive File Formatmentioning
confidence: 99%