SUMMARYThis paper proposes a two-layer deduplication system for the backup process in an IT environment. The proposed system implements two chunk sizes in a descendent way and activates them simultaneously in a sequence. They eliminate duplicated parts among backup target data more efficiently than any conventional single-layer deduplication system. The system weakens a substantial trade-off between the deduplication rate and the performance, which is defined as the number of chunks reduced. Compared to a conventional approach, the proposed system provides an equivalent deduplication rate with much less performance degradation or more performance with less compensation of the deduplication rate. In addition, it achieves better stability of performance when small chunk sizes are used to manipulate heavily duplicated data and densely changed data. The positive effects of the proposed system are evaluated quantitatively and are compared to a conventional approach by using a simulation. The results are certified with a prototype machine. C⃝ 2016 Wiley Periodicals, Inc. Electron Comm Jpn, 99(2): 28-36, 2016; Published online in Wiley Online Library (wileyonlinelibrary.com).
Deduplication backup technology removes redundant data segments in a system in order to reduce the capacity usage in the target backup storage. This technique also provides better performance, less resource utilization, less energy consumption, and TCO. This paper describes an optimization method for deduplication backup in an IT system in which multiple deduplication processes are simultaneously installed and activated. The method provides an algorithm for assignment of backup target files to installed deduplication processes to maximize the aggregate deduplication ratio while maintaining the predefined system requirements, such as backup-windows limitation and resource utilization limitation. In a real-world system, the time used to perform deduplication in each process is not constant, but variable due to operational resources contention, waiting time in queues, and data characteristics. The proposed method points out that these time parameters can be simulated as following a normal distribution, formularizes discrete assignment programming, then defines combinatory approaches of integer linear programming to incorporate the maximum deduplication ratio and the binary adjustment of parameters so as to ensure that the variance is not exceeded. By utilizing this method, the system can achieve the maximal deduplication ratio while respecting time requirements within a predefined tolerance. The effectiveness of the method is demonstrated through simulations. C⃝ 2015 Wiley Periodicals, Inc. Electron Comm Jpn, 98(2): 10-19, 2015; Published online in Wiley Online Library (wileyonlinelibrary.com).
This paper proposes a two layers deduplication system for backup process in IT environment. The proposed system implements two chunk sizes in descendent way and activates them simultaneously in a sequence. They eliminate duplicated parts among backup target data in more efficiently than any conventional single layer deduplication system. The system weakens a substantial trade-off between the deduplication rate and the performance, which is defined as the number of chunks reduced. Comparing to a conventional approach the proposed system provides the equivalent deduplication rate with much less performance degradation or more performance with less compensation of deduplication rate. In addition, it achieves better stability of performance when small chunk sizes manipulate heavily duplicated data and densely changed data. The positive effectivenesses of the proposed system are evaluated quantitatively and are compared to a conventional approach by using a simulation. The results are certified with a prototype machine.
This paper proposes a multiple layered deduplication system for backup operation in IT environment. The proposed system reduces the duplication in data by using a series of algorithms which are installed with different chunk sizes in descendent order. Our research defines the models and formula for the cumulative deduplication rate and processing time over multiple layers of the system, then, points out the efficiency is heavily affected by how to assign the chunk sizes in each layer in order to achieve the optimal assignment. Finally, the efficiency of the proposal is compared to a conventional single layer deduplication system to assure the improvement.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.