Abstract. We present an architecture-portable and
performant implementation of the atmospheric dynamical core (High-Order
Methods Modeling Environment, HOMME) of the Energy Exascale Earth System
Model (E3SM). The original Fortran implementation is highly performant and
scalable on conventional architectures using the Message Passing Interface
(MPI) and Open MultiProcessor (OpenMP) programming models.
We rewrite the model in C++ and use the Kokkos library to
express on-node parallelism in a largely architecture-independent
implementation. Kokkos provides an abstraction of a compute node or device,
layout-polymorphic multidimensional arrays, and parallel execution
constructs. The new implementation achieves the same or better performance on
conventional multicore computers and is portable to GPUs. We present
performance data for the original and new implementations on multiple
platforms, on up to 5400 compute nodes, and study several aspects of the
single- and multi-node performance characteristics of the new implementation
on conventional CPU (e.g., Intel Xeon), many core CPU (e.g., Intel Xeon Phi Knights Landing),
and Nvidia V100 GPU.