2017
DOI: 10.1007/s11227-016-1949-7
|View full text |Cite
|
Sign up to set email alerts
|

Moving metadata from ad hoc files to database tables for robust, highly available, and scalable HDFS

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 14 publications
(5 citation statements)
references
References 19 publications
0
5
0
Order By: Relevance
“…A recent study conducted by [24] showed that the k-means algorithm based on the Hadoop framework with 3 slave nodes achieves up to 2 times speedup against the sequential k-means. However, the overhead accompanied by reading and writing data to the local storage at each iteration harms the overall performance of the Hadoop framework [28].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…A recent study conducted by [24] showed that the k-means algorithm based on the Hadoop framework with 3 slave nodes achieves up to 2 times speedup against the sequential k-means. However, the overhead accompanied by reading and writing data to the local storage at each iteration harms the overall performance of the Hadoop framework [28].…”
Section: Related Workmentioning
confidence: 99%
“…It was found to achieve a significant speedup over the sequential k-means algorithm. However, the overhead accompanied by storing and retrieving data to/from the HDFS at each iteration degrades the overall performance of the Hadoop framework [28]. Additionally, Hadoop is not suited for small files [35].…”
Section: Related Workmentioning
confidence: 99%
“…If the Apache Hadoop is a model for reading and writing data processing based on disk, the Apache Spark performs in-memory calculations with the resilient distributed data sets. Apache Hadoop is an open-source Java-based distributed computing framework built for applications implemented using MapReduce parallel data processing paradigm [7] and Hadoop Distributed File System (HDFS) [8].…”
Section: Background and Motivationmentioning
confidence: 99%
“…The rapid development of Big Data frameworks addresses the distribution, communication, and processing of a vast number of data. For instance, Hadoop and Spark popular frameworks [2] may handle massive amounts of data relying on the MapReduce paradigm [3] to process and generate extensive data sets [4]. The data sets are stored across distributed clusters to run a distributed processing scheme in each cluster.…”
Section: Introductionmentioning
confidence: 99%