The Message Passing Interface (MPI) is a crucial programming tool for enabling communication between processes in parallel applications. The goal of MPI users is to allocate tasks to processors in a way that maximizes both spatial and temporal locality in the network. However, this can be challenging, especially in large-scale networks where maximizing processor locality may not be feasible at runtime. To address this issue, we propose the use of Hamorder, an offline node reassignment approach that takes into account physical processor locations based on graph reordering for Random network topologies. Hamorder aims to optimize task mapping for improved performance in parallel applications, whether for multiple tasks or within a single task. Additionally, we investigate the potential of improving MPI applications through runtime parameter tuning based on Hamorder. Our evaluation results show that Hamorder provides a 27.3% improvement in performance compared to the Gorder algorithm on Random topologies, which is a state-ofthe-art solution designed with the aim of enhancing cache locality and achieves this goal by rearranging the vertices of a graph in a way that places the vertices that are typically accessed together in close proximity. Moreover, our autotuning framework using Hamorder results in an average speedup of 1.38x for targeted MPI applications by searching through various runtime parameter combinations.
The use of approximate communication has emerged as a promising approach for enhancing the efficiency of communication in parallel computer systems. By sending incomplete or imprecise messages, approximate communication can significantly reduce communication time. In this study, we examine application-level techniques for approximate communication to enable high portability on high-performance interconnection networks. Specifically, we focus on lossy compression of floating-point data, which is frequently exchanged between compute nodes in parallel applications. Our approach involves a simple application scenario where a source process compresses a communication dataset and a destination process decompresses it in an MPI parallel program. We use two bitwise procedures for compression: lossy bitzip compression and lossless bit-mask compression. Our aim is to transmit the largest possible amount of approximate data with the least possible compression overhead. Additionally, we explore error check and correction techniques to ensure bit-flip fault tolerance for the compressed data during transmission. We implement our scheme in several communication-intensive MPI applications and demonstrate that our approximate communication approach effectively speeds up total execution time while staying within a specified quality-of-result error bound.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.