Summary
In Data Grid systems, quick data access is a challenging issue due to the high latency. The failure of requests is one of the most common matters in these systems that has an impact on performance and access delay. Job scheduling and data replication are two main techniques in reducing access latency. In this paper, we propose two new neighborhood‐based job scheduling strategies and a novel neighborhood‐based dynamic data replication algorithm (NDDR). The proposed algorithms reduce the access latency by considering a variety of practical parameters for decision making and the access delay by considering the failure probability of a node in job scheduling, replica selection, and replica placement. The proposed neighborhood concept in job scheduling includes all the nodes with low data transmission costs. Therefore, we can select the best computational node and reduce the search time by running a hierarchical and parallel search. NDDR reduces the access latency through selecting the best replica by performing a hierarchical search established based on the access time, storage queue workload, storage speed, and failure probability. NDDR improves the load balancing and data locality by selecting the best replication place considering the workload, temporal locality, geographical locality, and spatial locality. We evaluate our proposed algorithms by using Optorsim Simulator in two scenarios. The simulations confirm that the proposed algorithms improve the results compared with similar existing algorithms by 11%, 15%, 12%, and 10% in terms of mean job time, replication frequency, mean data access latency, and effective network usage, respectively.
The efficiency of data-intensive applications in distributed environments such as Cloud, Fog, and Grid is directly related to data access delay. Delays caused by queue workload and delays caused by failure can decrease data access efficiency. Data replication is a critical technique in reducing access latency. In this paper, a fuzzy-based replication algorithm is proposed, which avoids the mentioned imposed delays by considering a comprehensive set of significant parameters to improve performance. The proposed algorithm selects the appropriate replica using a hierarchical method, taking into account the transmission cost, queue delay, and failure probability. The algorithm determines the best place for replication using a fuzzy inference system considering the queue workload, number of accesses in the future, last access time, and communication capacity. It uses the Simple Exponential Smoothing method to predict future file popularity. The OptorSim simulator evaluates the proposed algorithm in different access patterns. The results show that the algorithm improves performance in terms of the number of replications, the percentage of storage filled, and the mean job execution time. The proposed algorithm has the highest efficiency in random access patterns, especially random Zipf access patterns. It also has good performance when the number of jobs and file size are increased.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.