Qinlu He scite author profile

Deduplication is a popular data reduction technology in storage systems which has significant advantages, such as finding and eliminating duplicate data, reducing data storage capacity required, increasing resource utilization, and saving storage costs. The file features are a key factor that is used to calculate the similarity between files, but the similarity calculated by the single feature has some limitations especially for the similar files. The storage node feature reflects the load condition of the node, which is the key factor to be considered in the data routing. This paper introduces a multifeature data routing strategy (DRMF). The routing strategy is made based on the features of the cluster, including routing communication, file similarity calculation, and the determination of the target node. The mutual information exchange is achieved by routing communication, routing servers, and storage nodes. The storage node calculates the similarity between the files stored, and then the file is routed according to the information provided by the routing server. The routing server determines the target node of the route according to the similar results and the node load features. The system prototype is designed and implemented; also, we develop a system to process the feature of cluster and determine the specific parameters of various features of experiments. In the end, we simulate the multifeature data routing and single-feature data routing, respectively, and compare the deduplication rate and data slope between the two strategies. The experimental results show that the proposed data routing strategy using multiple features can improve the deduplication rate of the cluster and maintain a lower data skew rate compared with the single-feature-based routing strategy MCS; DRMF can improve the deduplication rate of the cluster and maintain a lower data skew rate.

show abstract

Research on Data Routing Strategy of Deduplication in Cloud Environment

et al. 2022

IEEE Access

The application of data deduplication technology reduces the demand for data storage and improves resource utilization. Compared with limited storage capacity and computing capacity of a single node, cluster data deduplication technology has great advantages. However, the cluster data duplication technology also brings new issues on deduplication rate reduction and load balancing of storage nodes. The application of data routing strategy can well balance the problem of deduplication rate and load balancing. Therefore, this paper proposes a data routing strategy based on distributed Bloom Filter. 1)Superchunk is used as the basic unit of data routing to improve system throughput. According to Broder's theorem, k leastsized fingerprints are selected as the Superchunk features and send to the storage node. The optimal node is selected as the routing node by matching the BloomFilter, and the storage capacity of the node and maintained in the memory of the storage node. 2) Design and implement system prototypes. The specific parameters of all kinds of routing strategies are obtained through experiments, and the routing strategies proposed in this paper are tested. The theoretical analysis and experimental results prove the feasibility of the strategies proposed by this paper. Compared with the other routing strategies, our method improved 3% of the deduplication rate, reduces the communication query overhead by more than 36% and improves the load balancing degree of the storage system.

show abstract

RTFTL: design and implementation of real-time FTL algorithm for flash memory

Bian²,

Zhang³

et al. 2022

J Supercomput

Dynamic decision-making strategy of replica number based on data hot

et al. 2023

J Supercomput

TCFTL: Improved Real-Time Flash Memory Two Cache Flash Translation Layer Algorithm

Journal of Nanoelectronics and Optoelectronics

et al. 2021

The traditional flash translation layer (FTL) algorithm is mainly aimed at optimizing the average response time of flash reading and writing. Since it cannot be updated in place, for the traditional FTL algorithm, it is necessary to find a free page for writing each time. When a block is full, it will redistribute a free block. Therefore, when the flash memory is almost full, a written request will lead to a garbage collection, which will have many write copies, which will lead to a substantial decline in response time. This paper proposes an algorithm that makes full use of spatial locality and temporal locality to optimize the address cache in Demand-based Flash Translation Layer (DFTL) algorithm. In the experiment, this algorithm experiment and good results are obtained.

show abstract

Synchronization Control for Hydraulic Motors of Boom Refueling Experimental Platform

et al. 2018

Modeling and controller design by sliding-mode control for refueling boom

2016

File block multi-replica management technology in cloud storage

et al. 2023

Cluster Comput