Global Computing achieves high throughput computing by harvesting a very large number of unused computing resources connected to the Internet. This parallel computing model targets a parallel architecture defined by a very high number of nodes, poor communication performance and continuously varying resources. The unprecedented scale of the Global Computing architecture paradigm requires to revisit many basic issues related to parallel architecture : programming models, performance models, and class of applications or algorithms suitable to this architecture. Xtreni Web is an experimental Global Computing platform dedicated to provide a tool f o r such studies, This paper presents the design of Xtreni Web. Two essential features of this design are multi-applications and high-performance. Accepting multiple applications allows institutions or enterprises to setup their own Global Computing applications or experiments. High-performance is ensured by scalability, fault tolerance, efJicient scheduling and a large base of volunteer PCs. We also present an implementation of the first global application running on Xtrem Web,
ISBN: 0-7695-152International audienceGlobal Computing platforms, large scale clusters and future TeraGRID systems gather thousands of nodes for computing parallel scientific applications. At this scale, node failures or disconnections are frequent events. This Volatility reduces the MTBF of the whole system in the range of hours or minutes. We present MPICH-V, an automatic Volatility tolerant MPI environment based on uncoordinated checkpoint/roll-back and distributed message logging. MPICH-V architecture relies on Channel Memories, Checkpoint servers and theoretically proven protocols to execute existing or new, SPMD and Master-Worker MPI applications on volatile nodes. To evaluate its capabilities, we run MPICH-V within a framework for which the number of nodes, Channels Memories and Checkpoint Servers can be completely configured as well as the node Volatility. We present a detailed performance evaluation of every component of MPICH-V and its global performance for non-trivial parallel applications. Experimental results demonstrate good scalability and high tolerance to node volatility
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.