It is indispensable to scale erasure-coded storage clusters to meet requirements of increased storage capacity and I/O performance. In this study, we propose an efficient scaling scheme for Reed-Solomon-coded storage clusters called Scale-RS, which has three salient features. First, Scale-RS achieves uniform data distribution by equally placing data blocks among old and new chunks using a transposed data layout. Second, Scale-RS minimizes data movement incurred in the procedures of data redistribution and parity update. Scale-RS not only reaches the lower bound of data migration traffic by transferring necessary data blocks from old data chunks to new chunks, but it also reduces update traffic via generating parity difference blocks from data blocks stored in an individual data chunk. Third, Scale-RS improves the I/O performance of scaled storage clusters in terms of read parallelism and write throughput. We implement Scale-RS along with two alternative scaling schemes in a Reed-Solomon-coded storage cluster, on which real-world I/O traces are replayed. Experimental results demonstrate that Scale-RS achieves the highest read performance among the three scaling schemes after data redistribution. When it comes to scaling from six data chunks to nine, Scale-RS can outperform the other two scaling schemes in terms of aggregate write throughput by a factor of 2.85 and 3.05 under online filling and offline filling, respectively. We also show that user response time is slightly enlarged during data redistribution due to bandwidth competition between migration and user I/Os.
Continuous data scale growth increases energy consumption and operating cost that cannot be ignored in cloud storage systems. Previous studies have shown that analyzing the characteristics of I/O access and mining data features is effective for reasonable data distribution in storage systems. The granularity and criterion of classification are the key factors in determining the data distribution. To decrease energy consumption and operating cost, this paper puts forward a fine-grained framework of the climatic-season-based energy-aware in cloud storage system called CSEA. The framework concludes the following three aspects: (i) data feature mining. CSEA discovers potential data features by analyzing data access to provide help with data classification. (ii) K-means clustering algorithm. CSEA uses an unsupervised data classification algorithm in machine learning to divide data into categories based on seasonal characteristics by gathering real I/O access. (iii) data distribution of fine-grained. On the basis of seasonal features, CSEA fuses regional features to further refine the data distribution granularity to save on energy consumption and operating cost. Simulation experiments using extended CloudSimDisk and the constructed mathematical models indicate that CSEA reduces the energy consumption and operating cost compared with the single data classification standard and coarse-grained data distribution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.