Today's massively parallel machines are typically message-passing systems consisting of hundreds or thousands of processors. Implementing parallel applications efficiently in this environment is a challenging task, and poor parallel design decisions can be expensive to correct. Tools and techniques that allow the fast and accurate evaluation of different parallelization strategies would significantly improve the productivity of application developers and increase throughput on pardel architectures. This paper investigates one of the major issues in building tools to compare parallelization strategies: determining what type of performance models of the application code and of the computer system are sufficient for a fast and accurate comparison of different strategies. The paper is built around a case study employing the Performance Prediction Tool (PerPreT) to predict performance of the Parallel Spectral Transform Shallow Water Model code (PSTSWM) on the Intel Paragon.PSTSWM is a parallel application code that was designed to evaluate different pard e l strategies for the spectral transform method as it is used in climate modeling and weather forecasting. Multiple parallel algorithms and algorithm variants are embedded in the code. PerPreT uses a relatively simple algebraic model to predict execution time for SPMD (Single Program Multiple Data) parallel applications. Applications are modeled through parameterized formulae for communication and computation, where the parameters include the problem size, the number of processors used to execute the program, and system characteristics (e+, setup times for communication, link bandwidth, and sustained computing performance per processor).In this paper we describe performance models that predict the performance of the different algorithms in PSTSWM accurately enough to allow them to be compared, establishing the feasibility of such a demanding application of performance modeling. We also discuss issues in generating and validating the performance models, emphasizing the practical importance of tools such as PerPreT in such studies.
This paper describes a meuurement study of the effects of thread placement on memory access times on the Kendall Squ_re multiprocessor, the KSRI. The KSRI uses a conventions] shared memory prograunming mode] in a d_stributed memory axchitecture. The &rchitecture is b_ed on a ring of rings of 64-bit superscalar microprocessors. The KSRI has a Cache-Only Memory Architecture (COMA). Memory consists of the local cache memories attached to each processor. Whenever an address is accessed, the data item is automatically copied to the local cache memory module, so that access times for subsequent references will be minimal. If a local cache has space allocated for a paxticulsr data item, but does not have a current valid copy of that data item, then it is possible for the cache to acquire a valid read-only copy before it is requested by the local processor due to a request by a different processor that happens to pass by on the ring. This automatic prefetching can greatly reduce the averase time for a thread to acquire data items. Because of the automatic prefetching, the time required to obtain a valid copy of a data item does not depend simply on the distance from the owner of the data item, but s]so depends on the placement a_d number of other processing threads which sh_re the sa_nae data item. Al , the strategic placement of processing threads helps programs t_ke advantage of the unique features of the memory architecture which help eliminate memory access bottlenecks for shared data sets. Experiments run on the KSRI across a wide v_riety of thread configurations show that shared memory access is _ccelerated through stratesic placement of threads which sha_e data. The results indicate strategies for improving the performance of applications programs, and illustrate that KSRI memory access times can remain nearly constant even when the number of participating threads increases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.