A geographically distributed web server (GDWS) system, consisting of multiple server nodes interconnected by a MAN or a WAN, can achieve better efficiency in handling the ever-increasing web requests than centralized web servers because of the proximity of server nodes to clients. It is also more scalable since the throughput will not be limited by available bandwidth connecting to a central server. The key research issue in the design of GDWS is how to replicate and distribute the documents of a website among the server nodes. This paper proposes a density-based replication scheme and applies it to our proposed Extensible GDWS (EGDWS) architecture. Its document distribution scheme supports partial replication targeting only at hot objects among the documents. To distribute the replicas generated via the density-based replication scheme, we propose four different document distribution algorithms: Greedy-cost, Maximal-density, Greedypenalty, and Proximity-aware. A proximity-based routing mechanism is designed to incorporate these algorithms for achieving better web server performance in a WAN environment. Simulation results show that our document replication and distribution algorithms achieve better response times and load balancing than existing dynamic schemes. To further reduce user's response time, we propose two document grouping algorithms that can cut down on the request redirection overheads.
How documents of Web site are replicated and where they are placed among the server nodes have an important bearing on balance of load in a Distributed Web Server (DWS) system. The traffic generated due to movements of documents at runtime during load balance could also affect the performance of the DWS system. In this paper, we prove that minimizing such traffic in a DWS system is NP-hard. We propose several heuristic document distribution schemes that perform partial replication of a site's documents at selected server locations so that load balancing is maintained. We carry out simulation of these schemes using both a synthetic workload and real log data. From the simulation results, we find that using an additional 50% of storage for replication, our heuristics can improve the load balancing performance in the DWS system by 48%, and the internal traffic due to movements of documents has an negligible effect on the system's performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.