In this paper, we study the impact of memory architectures, distributed memory (DM) and virtual shared memory (VSM), in the solution of parallel numerical algorithms on a multi-processor nodes cluster. The parallel implementation of the shallow water equations to model the Tsunami effect is chosen to be the case study. Data is partitioned into sub domains, namely a three into four Grid scheme and a six into eight Grid scheme which are used for the parallel implementation of this model. We present four parallel algorithms in each Grid scheme: distributed memory without threads, distributed memory with threads, virtual shared memory without threads, and virtual shared memory with threads. These eight parallel algorithms have been implemented on a high performance cluster, connected to the "Nordugrid". Experiments are realized using the Message Passing Interface (MPI) library, the C/Linda, and the Linux pthreads. Subject to available memory, the virtual shared memory version without threads performs best, but as the task is scaled up, the threaded version becomes efficient in both DM and VSM implementations. IINTRODUCTION The atmospheric model known as the shallow water equation model has been widely used to study the Tsunami effect [4,5]. The model consists of a set of partial differential equations describing the basic fluid dynamical behaviour of the atmosphere. Solving partial differential equations numerically for real life problems has always been considered computationally demanding. Nevertheless, the need for more efficient solution techniques and faster computers seems unlimited. Therefore, developing numerical techniques and achieving additional knowledge about utilizing super computers/clusters efficiently, will for many years to come be important in order to accommodate the increasing need for computational efficiency [1]. Strategies to improve the accuracy and overall quality of model predictions have been and continue to be of great interest to numerical model developers. In addition to accuracy, the utility of a numerical model is affected by the algorithms efficiency [3].Parallel computing provides a feasible and efficient approach to solve very large-scale prediction problems. The transposition of data is required, each time the direction of data dependence changes. For parallel computation, each processor has to communicate with each other to redistribute the data. This is a potentially time-consuming task for parallel architectures [2]. Figure 1 shows the memory hierarchy that exists in most nodes of modern clustering environments [11]. Globally, many nodes are linked together by a high-speed network; inside each node there may be two or more processors; along with each processor memory access is either to a high speed memory unit "cache" or the low speed "main memory".In this paper, we study the impact of two overlay memory architectures, distributed memory and virtual shared memory in the presence of multiple levels of parallelism, for the solution of numerical algorithms. The parallel implement...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.