An efficient Hadoop data replication method design for heterogeneous clusters

Park, Daeshin; Kang, Kiwook; Hong, Jiman; Cho, Yookun

doi:10.1145/2851613.2851945

Cited by 8 publications

(3 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another data placement approach considers the characteristics of the data such as popularity 15–18 . For example, Xiong et al 15,16 utilized both the computing performance of each node and the popularity of each block evaluated using the time series and frequency of requests.…”

Section: Related Workmentioning

confidence: 99%

“…Tandon et al 17 proposed a scheme that replicates popular blocks to reduce the network overhead caused by remote access. Park et al 18 proposed a scheme that replicates the least frequently accessed and oldest blocks. Wang et al 19 and Wu et al 20 proposed data placement strategies focusing on correlated data based on the fact that data locality is increased if correlated data are accessed simultaneously.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Novel data‐placement scheme for improving the data locality of Hadoop in heterogeneous environments

Bae

Yeo

Park

et al. 2020

Concurrency and Computation

View full text Add to dashboard Cite

Summary To address the challenging needs of high‐performance big data processing, parallel‐distributed frameworks such as Hadoop are being utilized extensively. However, in heterogeneous environments, the performance of Hadoop clusters is below par. This is primarily because the blocks of the clusters are allocated equally to all nodes without regard to differences in the capability of individual nodes. This results in reduced data locality. Thus, a new data‐placement scheme that enhances data locality is required for Hadoop in heterogeneous environments. This article proposes a new data placement scheme that preserves the same degree of data locality in heterogeneous environments as that of the standard Hadoop, with only a small amount of replicated data. In the proposed scheme, only those blocks with the highest probability of being accessed remotely are selected and replicated. The results of experiments conducted indicate that the proposed scheme incurs only a 20% disk space overhead and has virtually the same data locality ratio as the standard Hadoop, which has a replication factor of three and 200% disk space overhead.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Novel data‐placement scheme for improving the data locality of Hadoop in heterogeneous environments

Bae

Yeo

Park

et al. 2020

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…Each slave node manages individual map or reduce task as a TaskTracker [11] in hadoop1.0 and it is replaced as a YARN services in Hadoop 2.0 later version. DataNode [12] is provides a service to stores huge data or running MapReduce operations. The DataNodes are communicating with NameNode in regular intervals for updating metadata and also the TaskTracker communicating with JobTracker regularly.…”

Section: B Slave Node [10]mentioning

confidence: 99%

Designing Authentication for Hadoop Cluster using DNA Algorithm.

Balaraju*¹,

Rao²

2019

IJRTE

View full text Add to dashboard Cite

Big Data (BD) generation are exponentially increased and it is necessarily required for modern society. Hadoop Clusters (HC) provides the facilities like Processing, storage and doesn’t have built-in security. But this feature is important as it increases the analysis speed, storage process. HC facilitates storage, processing of data, on the other hand processing of streaming data handled by the Apache Spark. However data storage, processing power, cluster management and data security in HC is not reached up to the mark with increased data. In such situations, HC are scaled out from small scale IT organization and it depends on public cloud centers with lack of data security, communication, computation and operational cost. On the other hand data security in HC is major issue and it uses a separat security mechanisms. This paper proposes New Algorithm Built in Authentication Based on Access (BABA) as a security instance integrated as Hadoop instance for securing data in HC from attackers along with metadata security for avoiding crashes Hadoop. This mechanism provides a secured HC without using other security configurations which will reduce operational cost, computational power, increases data security and providing a better solution for HC.

show abstract

Hadoop Performance Acceleration by Effective Data and Job Placement

Shah¹,

Padole

2020

Advances in Intelligent Systems and Computing

View full text Add to dashboard Cite

An efficient Hadoop data replication method design for heterogeneous clusters

Cited by 8 publications

References 3 publications

Novel data‐placement scheme for improving the data locality of Hadoop in heterogeneous environments

Novel data‐placement scheme for improving the data locality of Hadoop in heterogeneous environments

Designing Authentication for Hadoop Cluster using DNA Algorithm.

Hadoop Performance Acceleration by Effective Data and Job Placement

Contact Info

Product

Resources

About