With the evolution of high-performance computing towards heterogeneous, massively parallel systems, parallel applications have developed new fault tolerance necessities. Checkpointing has become a widely used technique to obtain fault tolerance. Whether due to a failure in the execution or to a migration of the processes to different machines, checkpointing tools must be able to operate in heterogeneous environments. Portable checkpointers usually work around portability issues at the cost of transparency: the user must provide information as what data needs to be stored, where to store it, or where to checkpoint. CPPC (Controller/Precompiler for Portable Checkpointing) is a checkpointing tool designed to feature both portability and transparency. It is made up of a library containing checkpointing routines and a compiler which automates the use of the library. This paper gives an overview of the CPPC tool. Experimental results using benchmarks and large-scale real applications are included, demonstrating usability, efficiency and portability.
The scalability of High Performance Computing (HPC) applications depends heavily on the efficient support of network communications in virtualized environments. However, Infrastructure as a Service (IaaS) providers are more focused on deploying systems with higher computational power interconnected via high-speed networks rather than improving the scalability of the communication middleware. This paper analyzes the main performance bottlenecks in HPC applications scalability on Amazon EC2 Cluster Compute platform: (1) evaluating the communication performance on shared memory and a virtualized 10 Gigabit Ethernet network; (2) assessing the scalability of representative HPC codes, the NAS Parallel Benchmarks, using an important number of cores, up to 512; (3) analyzing the new cluster instances (CC2), both in terms of single instance performance, scalability and costefficiency of its use; (4) suggesting techniques for reducing the impact of the virtualization overhead in the scalability of communication-intensive HPC codes, such as the direct access of the Virtual Machine to the network and reducing the number of processes per instance; and (5) proposing the combination of message-passing with multithreading as the most scalable and cost-effective option for running HPC applications on Amazon EC2 Cluster Compute platform.
The current trend to multicore architectures underscores the need of parallelism. While new languages and alternatives for supporting more efficiently these systems are proposed, MPI faces this new challenge. Therefore, up-to-date performance evaluations of current options for programming multicore systems are needed. This paper evaluates MPI performance against Unified Parallel C (UPC) and OpenMP on multicore architectures. From the analysis of the results, it can be concluded that MPI is generally the best choice on multicore systems with both shared and hybrid shared/distributed memory, as it takes the highest advantage of data locality, the key factor for performance in these systems. Regarding UPC, although it exploits efficiently the data layout in memory, it suffers from remote shared memory accesses, whereas OpenMP usually lacks efficient data locality support and is restricted to shared memory systems, which limits its scalability.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.