Everett Phillips scite author profile

We extract pixel-level masks of extreme weather patterns using variants of Tiramisu and DeepLabv3+ neural networks. We describe improvements to the software frameworks, input pipeline, and the network training algorithms necessary to efficiently scale deep learning on the Piz Daint and Summit systems. The Tiramisu network scales to 5300 P100 GPUs with a sustained throughput of 21.0 PF/s and parallel efficiency of 79.0%. DeepLabv3+ scales up to 27360 V100 GPUs with a sustained throughput of 325.8 PF/s and a parallel efficiency of 90.7% in single precision. By taking advantage of the FP16 Tensor Cores, a half-precision version of the DeepLabv3+ network achieves a peak and sustained throughput of 1.13 EF/s and 999.0 PF/s respectively.

show abstract

AFiD-GPU: A versatile Navier–Stokes solver for wall-bounded turbulent flows on GPU clusters

Zhu

Phillips

Spandan

et al. 2018

Computer Physics Communications

View full text Add to dashboard Cite

The AFiD code, an open source solver for the incompressible Navier-Stokes equations (http://www.afid.eu), has been ported to GPU clusters to tackle large-scale wall-bounded turbulent flow simulations. The GPU porting has been carried out in CUDA Fortran with the extensive use of kernel loop directives (CUF kernels) in order to have a source code as close as possible to the original CPU version; just a few routines have been manually rewritten. A new transpose scheme, which is not limited to the GPU version only and can be generally applied to any CFD code that uses pencil distributed parallelization, has been devised to improve the scaling of the Poisson solver, the main bottleneck of incompressible solvers. The GPU version can reduce the wall clock time by an order of magnitude compared to the CPU version for large meshes. Due to the increased performance and efficient use of memory, the GPU version of AFiD can perform simulations in parameter ranges that are unprecedented in thermally-driven wall-bounded turbulence. To verify the accuracy of the code, turbulent Rayleigh-Bénard convection and plane Couette flow are simulated and the results are in good agreement with the experimental and computational data that published in previous literatures. PROGRAM SUMMARYProgram Title: AFiD-GPU Licensing provisions(please choose one): GPLv3 Programming language: Fortan 90, CUDA Fortan, MPI External routines: PGI, CUDA Toolkit, FFTW3, HDF5 Nature of problem(approx. 50-250 words): Solving the three-dimensional Navier-Stokes equations coupled with a scalar field in a cubic box bounded between two walls and other four periodic boundaries. Solution method(approx. 50-250 words): Second order finite difference method for spatial discretization, third order Runge-Kutta scheme and Crank-Nicolson method for time advancement, two dimensional pencil distributed MPI parallelization, GPU accelerated routines. Additional comments including Restrictions and Unusual features (approx. 50-250 words): The open-source code is supported and updated on http://www.afid.eu.

show abstract

Implementing the Himeno benchmark with CUDA on GPU clusters

Phillips

Fatica

2010

View full text Add to dashboard Cite

Fast implementation of DGEMM on Fermi GPU

Tan

Triechle

et al. 2011

View full text Add to dashboard Cite

Rapid Aerodynamic Performance Prediction on a Cluster of Graphics Processing Units

Phillips

Zhang

Davis

et al. 2009

View full text Add to dashboard Cite

Researchers have recently used the new programmable capabilities of the Graphics Processing Unit (GPU) to increase the performance of scientific codes. We investigate the use of a cluster of GPUs for large-scale CFD problems and show order-of-magnitude increases in performance and performance-to-price ratio. We implement two separate compressible flow solvers. First, we develop a CUDA-based solver for the 2D compressible Euler equations and verify the results against a reference multi-block code MBFLO. After demonstrating the performance of our Euler solver, we proceed to develop a new version of MBFLO by adding GPU-accelerated subroutines to the existing Fortran codebase. Using an eight-node cluster equiped with 16 NVIDIA 9800GX2 GPUs, we achieve speedups of up to 496x on our Euler Solver and 88x on MBFLO. This paper describes the numerical, hardware and software techniques that provide significant speedups.

show abstract

A CUDA Implementation of the High Performance Conjugate Gradient Benchmark

Phillips

Fatica

2015

View full text Add to dashboard Cite

A Performance Study of Quantum ESPRESSO’s PWscf Code on Multi-core and GPU Systems

Romero

Phillips

Ruetsch

et al. 2017

View full text Add to dashboard Cite

We describe the porting of PWscf (Plane-Wave Self Consistent Field), a key component of the Quantum ESPRESSO open-source suite of codes for materials modeling, to GPU systems using CUDA Fortran. Kernel loop directives (CUF kernels) have been extensively used in order to have a single source code for both CPU and GPU implementations. The results of the GPU version have been carefully validated and the performance of the code on several GPU systems (both x86 and POWER8 based) has been compared with traditional Intel multi-core (CPU only) systems. This current GPU version can reduce the timeto-solution by an average factor of 2 − 3 running two different input cases widely used as benchmarks on small and large high performance computing systems.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.