SUMMARYIn the last two years, OpenMP has been gaining popularity as a standard for developing portable shared memory parallel programs. With the improvements in centralized shared memory technologies and the emergence of distributed shared memory (DSM) architectures, several medium-to-large physical and logical shared memory configurations are now available. Thus, OpenMP stands to be a promising medium for developing scalable and portable parallel programs.In this paper, we focus on evaluating the suitability of OpenMP for developing scalable and portable irregular applications. We examine the programming paradigms supported by OpenMP that are suitable for this important class of applications, the performance and scalability achieved with these applications, the achieved locality and uniprocessor cache performance and the factors behind imperfect scalability. We have used two irregular applications and one NAS irregular code as the benchmarks for our study. Our experiments have been conducted on a 64-processor SGI Origin 2000.Our experiments show that reasonably good scalability is possible using OpenMP if careful attention is paid to locality and load balancing issues. Particularly, using the Single Program Multiple Data (SPMD) paradigm for programming is a significant win over just using loop parallelization directives. As expected, the cost of remote accesses is the major factor behind imperfect speedups of SPMD OpenMP programs.
Six application benchmarks, including four numerical aerodynamic simulation (NAS) codes, provided by H. Jin and J. Wu, were previously parallelized using OpenMP and message-passing interface (MPI) and run on a 128-processor Silicon Graphics Inc. (SGI) Origin 2000. Detailed profile data were collected to understand the factors causing imperfect scalability. The results show that load imbalance and cost of remote accesses are the main factors in limited speedup of the OpenMP versions, whereas communication costs are the single major factor in the performance of the MPI versions.
Traditionally, symmetric multiprocessors have used modest numbers of processors. Since many of them were bus-based systems, they inherently lacked scalability to what might be referred to as moderate-sized systems. With
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.