2013
DOI: 10.1007/978-3-319-03689-2_12
|View full text |Cite
|
Sign up to set email alerts
|

Performance Comparison of Hadoop Based Tools with Commercial ETL Tools – A Case Study

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(7 citation statements)
references
References 1 publication
0
7
0
Order By: Relevance
“…ETLMR was outlined in (Liu, Thomsen & Pedersen, 2012) for a demonstration purpose. (Misra et al, 2013) shows that ETL solutions based on MapReduce frameworks, such as Apache Hadoop, are very efficient and less costly compared to ETL tools market. Recently, the authors in (Liu et al, 2014) have proposed the CloudETL framework which uses Apache Hadoop to parallelize ETL processes and Apache Hive to process data.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…ETLMR was outlined in (Liu, Thomsen & Pedersen, 2012) for a demonstration purpose. (Misra et al, 2013) shows that ETL solutions based on MapReduce frameworks, such as Apache Hadoop, are very efficient and less costly compared to ETL tools market. Recently, the authors in (Liu et al, 2014) have proposed the CloudETL framework which uses Apache Hadoop to parallelize ETL processes and Apache Hive to process data.…”
Section: Related Workmentioning
confidence: 99%
“…Recently, the emergence of big data has generated much interest in the research community. Some authors such as (Liu, Thomsen & Pedersen, 2011), (Liu, Thomsen & Pedersen, 2014), and (Misra, Saha & Mazumdar, 2013) have proposed interesting ETL approaches. Indeed, our study is motivated by the fact that the existing conceptual modeling approaches such as (El Akkaoui & Zimányi, 2009), (Trujillo & Luján-Mora, 2003), and (Vassiliadis et al, 2002) are not suitable for big data environments.…”
Section: Introductionmentioning
confidence: 99%
“…The conventional ETL system is typically operated on a single machine that cannot effectively handle huge volumes of big data [12]. To deal with the considerable quantity of big data in the ETL process, there have been several attempts in recent years to utilize a parallelized data processing concept [13][14][15].…”
Section: Introductionmentioning
confidence: 99%
“…This study conducted an experimental evaluation assessing system scalability based on different scales of jobs and data to compare with other MapReduce-based tools. Another study [15] compared Hadoop-based ETL solutions with commercial ETL solutions in terms of cost and performance. They concluded that Hadoop-based ETL solutions are better in comparison to existing commercial ETL solutions.…”
Section: Introductionmentioning
confidence: 99%
“…Pitfall of this effort, needed for data cleaning during extraction and integration, is increasing response times but is necessary to achieve query-optimization and data quality (Misra, Saha et al 2013). Thus, data cleaning is an on-going process that requires awareness of underlying fundamental principles that are subjected to performance improvement (Chaturvedi, Faruquie et al 2015).…”
Section: Introductionmentioning
confidence: 99%