Summary
The 3‐P challenge of high‐performance programming—performance, portability and productivity—has become more difficult than ever in the age of heterogeneous computing. It would be naïve to think that the performance portability problem can be completely solved, but it can certainly be reduced and made tolerable. However, first and foremost, an agreement is needed on what it means for an application to be performance portable. Unfortunately, there is still no consensus in the scientific community on a workable definition of the term performance portability. Several years ago, a comprehensive effort was made to formulate a novel definition of performance portability and an associated metric. Since the new metric was first introduced, it has been widely adopted by the scientific community, and many advanced studies have used it. Unfortunately, the definition of the new metric has flaws. This article presents a proof of the theoretical flaws in the definition of the new metric, considers the practical implications of these flaws as reflected in many studies that have used it in recent years, and proposes a revised metric that addresses the flaws and provides guidelines on how to use it correctly.
Parallelization lets applications exploit the high throughput of new multicore processors, and the OpenMP parallel programming model helps developers create multithreaded applications.
SUMMARYThe rapid rise of OpenMP as the preferred parallel programming paradigm for small-to-medium scale parallelism could slow unless OpenMP can show capabilities for becoming the model-of-choice for large scale high-performance parallel computing in the coming decade.The main stumbling block for the adaptation of OpenMP to distributed shared memory (DSM) machines, which are based on architectures like cc-NUMA, stems from the lack of capabilities for data placement among processors and threads for achieving data locality. The absence of such a mechanism causes remote memory accesses and inefficient cache memory use, both of which lead to poor performance. This paper presents a simple software programming approach called copy-inside-copy-back (CC) that exploits the data privatization mechanism of OpenMP for data placement and replacement. This technique enables one to distribute data manually without taking away control and flexibility from the programmer and is thus an alternative to the automat and implicit approaches. Moreover, the CC approach improves on the OpenMP-SPMD style of programming that makes the development process of an OpenMP application more structured and simpler.The CC technique was tested and analyzed using the NAS Parallel Benchmarks on SGI Origin 2000 multiprocessor machines. This study shows that OpenMP improves performance of coarse-grained parallelism, although a fast copy mechanism is essential.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.