2020
DOI: 10.2991/ijndc.k.200515.007
|View full text |Cite
|
Sign up to set email alerts
|

High Performance Hadoop Distributed File System

Abstract: Although by the end of 2020, most of companies will be running 1000 node Hadoop in the system, the Hadoop implementation is still accompanied by many challenges like security, fault tolerance, flexibility. Hadoop is a software paradigm that handles big data, and it has a distributed file systems so-called Hadoop Distributed File System (HDFS). HDFS has the ability to handle fault tolerance using data replication technique. It works by repeating the data in multiple DataNodes which means the reliability and ava… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 15 publications
(7 citation statements)
references
References 10 publications
0
7
0
Order By: Relevance
“…These processes are distributed throughout the cluster of computers in parallel [ 50 ]. For further information addressing the fundamentals of Hadoop MapReduce, see [ 51 , 52 ].…”
Section: Proposed Malicious Url Detection Modelmentioning
confidence: 99%
“…These processes are distributed throughout the cluster of computers in parallel [ 50 ]. For further information addressing the fundamentals of Hadoop MapReduce, see [ 51 , 52 ].…”
Section: Proposed Malicious Url Detection Modelmentioning
confidence: 99%
“…Apache Hadoop is an HPDC that supports HTTP REST APIs to interact with the HDFS [22], [23], so after the system determines the new clusters of the files, it will call the HDFS API endpoint to apply the new replication policy for each file.…”
Section: E Applying New Replication Policiesmentioning
confidence: 99%
“…Following the classification, the credit cards with valid records are parsed to recognize the number of credit cards. It is then saved to the distributed file system, which is a Hadoop distributed file system (HDFS) in this implementation [25,26]. One point to be noted concerning this implementation is the feature of user notifications.…”
Section: Fog Computing Performance Evaluationmentioning
confidence: 99%
“…There are ideal tools for batch processing and maintenance layers, such as Hadoop and Impala. Hadoop is relevant because it can process and store petabytes of data, while Impala, in turn, interactively requests such data [25]. The real-time requirements for the batch and service layers are not met, since MapReduce itself has a significant delay, so hours of time can be required before the presentation data are propagated to the serving layer.…”
mentioning
confidence: 99%