A Study of Effective Replica Reconstruction Schemes for the Hadoop Distributed File System

Higai, Asami; Takefusa, Atsuko; Nakada, Hidemoto; Oguchi, Masato

doi:10.1587/transinf.2014edp7242

Cited by 5 publications

(3 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…HDFS is the foundation of data storage management in distributed computing, which has the advantages of high reliability, strong expansibility, and throughput. The premise and goal of the system design are as follows [24][25][26][27]:…”

Section: Foundationmentioning

confidence: 99%

See 1 more Smart Citation

Dynamic Prediction Research of Silicon Content in Hot Metal Driven by Big Data in Blast Furnace Smelting Process under Hadoop Cloud Platform

Han

Yang

et al. 2018

Complexity

View full text Add to dashboard Cite

In order to explore a dynamic prediction model with good generalization performance of the content of [Si] in molten iron, an improved SVM algorithm is proposed to enhance its practicability in the big data sample set of the smelting process. Firstly, we propose a parallelization scheme to design an SVM solution algorithm based on the MapReduce model under a Hadoop platform to improve the solution speed of the SVM on big data sample sets. Secondly, based on the characteristics of stochastic subgradient projection, the execution time of the SVM solver algorithm does not depend on the size of the sample set, and a structured SVM algorithm based on the neighbor propagation algorithm is proposed, and on this basis, a parallel algorithm for solving the covariance matrix of the training set and a parallel algorithm of the tth iteration of the random subgradient projection are designed. Finally, the historical production big data of No. 1 blast furnace in Tangshan Iron Works II was analyzed during 2015.12.01~2016.11.30 using the reaction mechanism, control mechanism, and gray correlation model in the process of blast furnace iron-making, an essential sample set with input x1k,x2k−3,x3k−3,…,x18k,x19k−1 and output Sik+1 is constructed, and the dynamic prediction model of the content of [Si] in molten iron and the dynamic prediction model of [Si] fluctuation in the molten iron are obtained on the Hadoop platform by means of the structure and parallelized SVM solving algorithm. The results of the research show that the structural and parallel SVM algorithms in the hot metal [Si] content value dynamic prediction hit rate and lifting dynamic prediction hit rate were 91.2% and 92.2%, respectively. Two kinds of dynamic prediction algorithms based on structure and parallelization are 54 times and 5 times faster than traditional serial solving algorithms.

show abstract

Section: Foundationmentioning

confidence: 99%

“…With the aid of Formula (26) and the MapReduce model, the solution of the covariance matrix of the whole training set is solved. The specific algorithm is described as follows:…”

Section: Structure and Parallelization Of The Pegasos Algorithmmentioning

confidence: 99%

Dynamic Prediction Research of Silicon Content in Hot Metal Driven by Big Data in Blast Furnace Smelting Process under Hadoop Cloud Platform

Han

Yang

et al. 2018

Complexity

View full text Add to dashboard Cite

show abstract

“…In order to ensure the reliability and availability of data storage, replication strategy and erasure codes have been more widely adopted in many current DSSs [3][4][5][6]. For example, Google File System (GFS) and Hadoop Distributed File System (HDFS) adopt multi-replication [7,8]. However, since multi-replication needs to store a large number of data to ensure high reliability, its storage cost is high.…”

Section: Introductionmentioning

confidence: 99%

Minimum Bandwidth Regenerating Codes Based on Cyclic VFR Codes

Wang¹,

Wang²,

Wang³

et al. 2019

KSII TIIS

View full text Add to dashboard Cite

In order to improve the reliability and repair efficiency of distributed storage systems, minimum bandwidth regenerating (MBR) codes based on cyclic variable fractional repetition (VFR) codes are constructed in this thesis, which can repair failed nodes accurately. Specifically, in order to consider the imbalance of data accessed by the users, cyclic VFR codes are constructed according to that data with different heat degrees are copied in different repetition degrees. Moreover, we divide the storage nodes into groups, and construct MBR codes based on cyclic VFR codes to improve the file download speed. Performance analysis and simulation results show that, the repair locality of a single node failure is always 2 when MBR codes based on cyclic VFR codes are adopted in distributed storage systems, which is obviously superior to the traditional MBR codes. Compared with RS codes and simple regenerating codes, the proposed MBR codes based on cyclic VFR codes have lower repair locality, repair complexity and bandwidth overhead, as well as higher repair efficiency. Moreover, relative to FR codes, the MBR codes based on cyclic VFR codes can be applicable to more storage systems.

show abstract

Parallel processing optimization strategy based on MapReduce model in cloud storage environment

Cui

Liu

Li³

2017

AIP Conference Proceedings

View full text Add to dashboard Cite

A Study of Effective Replica Reconstruction Schemes for the Hadoop Distributed File System

Cited by 5 publications

References 9 publications

Dynamic Prediction Research of Silicon Content in Hot Metal Driven by Big Data in Blast Furnace Smelting Process under Hadoop Cloud Platform

Dynamic Prediction Research of Silicon Content in Hot Metal Driven by Big Data in Blast Furnace Smelting Process under Hadoop Cloud Platform

Minimum Bandwidth Regenerating Codes Based on Cyclic VFR Codes

Parallel processing optimization strategy based on MapReduce model in cloud storage environment

Contact Info

Product

Resources

About