Over the last twenty years, the open source community has provided more and more software on which the world's High Performance Computing (HPC) systems depend for performance and productivity. The community has invested millions of dollars and years of effort to build key components. But although the investments in these separate software elements have been tremendously valuable, a great deal of productivity has also been lost because of the lack of planning, coordination, and key integration of technologies necessary to make them work together smoothly and efficiently, both within individual PetaScale systems and between different systems. It seems clear that this completely uncoordinated development model will not provide the software needed to support the unprecedented parallelism required for peta/exascale computation on millions of cores, or the flexibility required to exploit new hardware models and features, such as transactional memory, speculative execution, and GPUs. This report describes the work of the community to prepare for the challenges of exascale computing, ultimately combing their efforts in a coordinated International Exascale Software Project.
This paper introduces a high performance communication middle layer, called PM2, for heterogeneous network environments. PM2 currently supports Myrinet, Ethernet, and SMP. Binary code written in PM2 or written in a communication library, such as MPICH-SCore on top of PM2, may run on any combination of those networks without re-compilation. According to a set of NAS parallel benchmark results, MPICH-SCore performance is better than dedicated communication libraries such as MPICH-BIP/SMP and MPICH-GM when running some benchmark programs.
A high performance communication facility, called the GigaE PM, has been designed and implemented for parallel applications on clusters of computers using a Gigabit Ethernet. The GigaE PM provides not only a reliable high bandwidth and low latency communication function, but also supports existing network protocols such as TCP/IP.In the design of the GigaE PM, it is assumed that the Gigabit Ethernet card used has a dedicated processor and its program can be modified.A reliable communication mechanism for a parallel application is implemented on the firmware while existing network protocols are handled by an operating system kernel.A prototype system has been implemented using an Essential Communications Gigabit Ethernet card. The performance results show that a 48.3 ps round trip time for a four byte user message, and 56.7 MBytes/set bandwidth for a 1,468 byte message have been achieved on Intel Pentium II 400 MHz PCs. We have implemented MPICH-PM on top of the GigaE PM, and evaluated the performance using NAS parallel benchmarks. The results show that the IS class S performance on the GigaE PM is 1.8 times faster than that on TCP/IP.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.