We present a GPU accelerated nodal discontinuous Galerkin method for the solution of the three dimensional Euler equations that govern the motion and thermodynamic state of the atmosphere. Acceleration of the dynamical core of atmospheric models plays an important practical role in not only getting daily forecasts faster but also in obtaining more accurate (high resolution) results within a given simulation time limit. We use algorithms suitable for the single instruction multiple thread architecture of GPUs to accelerate our model by two orders of magnitude relative to one core of a CPU. Tests on one node of the Titan supercomputer show a speedup of upto 15 times using the K20X GPU as compared to that on the 16-core AMD Opteron CPU. The scalability of the multi-GPU implementation is tested using 16384 GPUs, which resulted in a weak scaling efficiency of about 90%. Finally, the accuracy and performance of our GPU implementation is verified using several benchmark problems representative of different scales of atmospheric dynamics.
We present the acceleration of an IMplicit-EXplicit (IMEX) non-hydrostatic atmospheric model on manycore processors such as GPUs and Intel's MIC architecture. IMEX time integration methods sidestep the constraint imposed by the Courant-Friedrichs-Lewy condition on explicit methods through corrective implicit solves within each time step. In this work, we implement and evaluate the performance of IMEX on manycore processors relative to explicit methods. Using 3D-IMEX at Courant number C=15 , we obtained a speedup of about 4X relative to an explicit time stepping method run with the maximum allowable C=1. Moreover, the unconditional stability of IMEX with respect to the fast waves means the speedup can increase significantly with the Courant number as long as the accuracy of the resulting solution is acceptable. We show a speedup of 100X at C=150 using 1D-IMEX to demonstrate this point. Several improvements on the IMEX procedure were necessary in order to outperform our results with explicit methods: a) reducing the number of degrees of freedom of the IMEX formulation by forming the Schur complement; b) formulating a horizontally-explicit vertically-implicit (HEVI) 1D-IMEX scheme that has a lower workload and potentially better scalability than 3D-IMEX; c) using high-order polynomial preconditioners to reduce the condition number of the resulting system; d) using a direct solver for the 1D-IMEX method by performing and storing LU factorizations once to obtain a constant cost for any Courant number. Without all of these improvements, explicit time integration methods turned out to be difficult to beat. We discuss in detail the IMEX infrastructure required for formulating and implementing efficient methods on manycore processors. Several parametric studies are conducted to demonstrate the gain from each of the above mentioned improvements. Finally, we validate our results with standard benchmark problems in numerical weather prediction and evaluate the performance and scalability of the IMEX method using up to 4192 GPUs and 16 Knights Landing processors.
A unified approach for the numerical solution of the 3D hyperbolic Euler equations using high order methods, namely continuous Galerkin (CG) and discontinuous Galerkin (DG) methods, is presented. First, we examine how classical CG that uses a global storage scheme can be constructed within the DG framework using constraint imposition techniques commonly used in the finite element literature. Then, we implement and test a simplified version in the Non-hydrostatic Unified Model of the Atmosphere (NUMA) for the case of explicit time integration and a diagonal mass matrix. Constructing CG within the DG framework allows CG to benefit from the desirable properties of DG such as, easier hp-refinement, better stability etc. Moreover, this representation allows for regional mixing of CG and DG depending on the flow regime in an area. The different flavors of CG and DG in the unified implementation are then tested for accuracy and performance using a suite of benchmark problems representative of cloud-resolving scale, meso-scale and global-scale atmospheric dynamics. The value of our unified approach is that we are able to show how to carry both CG and DG methods within the same code and also offer a simple recipe for modifying an existing CG code to DG and vice versa.
A Navier-Stokes equations solver is parallelized to run on a cluster of computers using the domain decomposition method. Two approaches of communication and computation are investigated, namely, synchronous and asynchronous methods. Asynchronous communication between subdomains is not commonly used in CFD codes; however, it has a potential to alleviate scaling bottlenecks incurred due to processors having to wait for each other at designated synchronization points. A common way to avoid this idle time is to overlap asynchronous communication with computation. For this to work, however, there must be something useful and independent a processor can do while waiting for messages to arrive. We investigate an alternative approach of computation, namely, conducting asynchronous iterations to improve local subdomain solution while communication is in progress. An in-house CFD code is parallelized using message passing interface (MPI), and scalability tests are conducted that suggest asynchronous iterations are a viable way of parallelizing CFD code.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.