International audienceDistributing applications over PC clusters to speed-up or size-up the execution is now commonplace. Yet efficiently tolerating faults of these systems is a major issue. To ease the addition of checkpoint-based fault tolerance at the application level, we introduce a Model for Low-Overhead Tolerance of Faults (MoLOToF) which is based on structuring applications using fault-tolerant skeletons. MoLOToF also encourages collaborations with the programmer and the execution environment. The skeletons are adapted to specific parallelization paradigms and yield what can be called fault-tolerant algorithmic skeletons. The application of MoLOToF to the SPMD parallelization paradigm results in our proposed FT-SPMD framework. Experiments show that the complexity for developing an application is small and the use of the framework has a small impact on performance. Comparisons with existing system-level checkpoint solutions, namely LAM/MPI and DMTCP, point out that FT-SPMD has a lower runtime overhead while being more robust when a higher level of fault tolerance is required
International audienceThis paper introduces a research project that aims to speed-up and size-up some gas storage valuations, based on a Stochastic Dynamic Programming algorithm. Such valuations are typically needed by investment projects and yield prices of gas storage spaces and facilities. However, they involve computations which require great amounts of CPU power or memory. As a result, their parallelization on PC clusters or supercomputers becomes highly attractive and some-times unavoidable despite its complexity. Our parallelization strategy is based on a message passing paradigm, and distributes both computations and data on a cluster, in order to achieve speed-up and size-up. It includes some complex and optimized data exchanges which are dynamically computed, planned and achieved at each computation step. This optimized data distribution and memory management allows to process large problems on a high number of processors. More-over, our parallel implementation is able to support different price models, and our first experiments on a standard 32 PC cluster show very good performances particularly for complex price models
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.