Solving the compressible navier-stokes equations on up to 1.97 million cores and 4.1 trillion grid points

Bermejo-Moreno, Iván; Bodart, Julien; Larsson, Johan; Barney, Blaise; Nichols, Joseph W.; Jones, S. Casey

doi:10.1145/2503210.2503265

Cited by 41 publications

(36 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…2π rû(r) dr (21) and an area-weighted equipartitioning of the flow rates,Û (m) = A (m) g /A gÛ , has been assumed. The accompanying thermoviscous functions are…”

Section: Ivb Thermoacoustic Stack In the X Directionmentioning

confidence: 99%

High-fidelity simulation of an ultrasonic standing-wave thermoacoustic engine with bulk viscosity effects

Lin

Scalo

Hesselink

2017

55th AIAA Aerospace Sciences Meeting

View full text Add to dashboard Cite

We have carried out boundary-layer-resolved, unstructured fully-compressible NavierStokes simulations of an ultrasonic standing-wave thermoacoustic engine (TAE) model. The model is constructed as a quarter-wavelength engine, approximately 4 mm by 4 mm in size and operating at 25 kHz, and comprises a thermoacoustic stack and a coin-shaped cavity, a design inspired by Flitcroft and Symko (2013).1 Thermal and viscous boundary layers (order of 10 µm) are resolved. Vibrational and rotational molecular relaxation are modeled with an effective bulk viscosity coefficient modifying the viscous stress tensor. The effective bulk viscosity coefficient is estimated from the difference between theoretical and semi-empirical attenuation curves. Contributions to the effective bulk viscosity coefficient can be identified as from vibrational and rotational molecular relaxation. The inclusion of the coefficient captures acoustic absorption from infrasonic (∼10 Hz) to ultrasonic (∼100 kHz) frequencies. The value of bulk viscosity depends on pressure, temperature, and frequency, as well as the relative humidity of the working fluid. Simulations of the TAE are carried out to the limit cycle, with growth rates and limit-cycle amplitudes varying nonmonotonically with the magnitude of bulk viscosity, reaching a maximum for a relative humidity level of 5%. A corresponding linear model with minor losses was developed; the linear model overpredicts transient growth rate but gives an accurate estimate of limit cycle behavior. An improved understanding of thermoacoustic energy conversion in the ultrasonic regime based on a high-fidelity computational framework will help to further improve the power density advantages of small-scale thermoacoustic engines.

show abstract

“…2π rû(r) dr (21) and an area-weighted equipartitioning of the flow rates,Û (m) = A (m) g /A gÛ , has been assumed. The accompanying thermoviscous functions are…”

Section: Ivb Thermoacoustic Stack In the X Directionmentioning

confidence: 99%

High-fidelity simulation of an ultrasonic standing-wave thermoacoustic engine with bulk viscosity effects

Lin

Scalo

Hesselink

2017

55th AIAA Aerospace Sciences Meeting

View full text Add to dashboard Cite

show abstract

“…Performance may be improved by avoiding external node communication until exhausting the domain of dependence, allowing the calculation to advance multiple timesteps while requiring a smaller number of communication events. This idea is the basis of swept time-space decomposition [3,4].Extreme-scale computing clusters have recently been used to solve the compressible Navier-Stokes equations on over 1.97 million CPU cores [5]. The monetary cost, power consumption, and size of such a cluster impedes the realization of widespread peta-and exa-scale computing required for real-time, high-fidelity, CFD simulations.…”

mentioning

confidence: 99%

“…Extreme-scale computing clusters have recently been used to solve the compressible Navier-Stokes equations on over 1.97 million CPU cores [5]. The monetary cost, power consumption, and size of such a cluster impedes the realization of widespread peta-and exa-scale computing required for real-time, high-fidelity, CFD simulations.…”

mentioning

confidence: 99%

Accelerating solutions of one-dimensional unsteady PDEs with GPU-based swept time–space decomposition

Magee

Niemeyer

2018

Journal of Computational Physics

View full text Add to dashboard Cite

The expedient design of precision components in aerospace and other high-tech industries requires simulations of physical phenomena often described by partial differential equations (PDEs) without exact solutions. Modern design problems require simulations with a level of resolution difficult to achieve in reasonable amounts of time-even in effectively parallelized solvers. Though the scale of the problem relative to available computing power is the greatest impediment to accelerating these applications, significant performance gains can be achieved through careful attention to the details of memory communication and access. The swept time-space decomposition rule reduces communication between subdomains by exhausting the domain of influence before communicating boundary values. Here we present a GPU implementation of the swept rule, which modifies the algorithm for improved performance on this processing architecture by prioritizing use of private (shared) memory, avoiding interblock communication, and overwriting unnecessary values. It shows significant improvement in the execution time of finite-difference solvers for one-dimensional unsteady PDEs, producing speedups of 2-9 × for a range of problem sizes, respectively, compared with simple GPU versions and 7-300 × compared with parallel CPU versions. However, for a more sophisticated one-dimensional system of equations discretized with a second-order finite-volume scheme, the swept rule performs 1.2-1.9 × worse than a standard implementation for all problem sizes.execution-simulation at the speed of nature-in accordance with the highperformance computing development goals set out in the CFD Vision 2030 report [1]. Classic approaches to domain decomposition for parallelized, explicit, time-stepping partial differential equation (PDE) solutions incur substantial computational performance costs from the communication between nodes required every timestep. This communication cost consists of two parts: latency and bandwidth, where latency is the fixed cost of each communication event and bandwidth is the variable cost that depends on the amount of data transferred. Latency in inter-node communication is a fundamental barrier to this goal, and advancements to network latency have historically been slower than improvements in other computing performance barriers such as bandwidth and computational power [2]. Performance may be improved by avoiding external node communication until exhausting the domain of dependence, allowing the calculation to advance multiple timesteps while requiring a smaller number of communication events. This idea is the basis of swept time-space decomposition [3,4].Extreme-scale computing clusters have recently been used to solve the compressible Navier-Stokes equations on over 1.97 million CPU cores [5]. The monetary cost, power consumption, and size of such a cluster impedes the realization of widespread peta-and exa-scale computing required for real-time, high-fidelity, CFD simulations. While these are significant challenges, they also pr...

show abstract

“…Computational systems no longer grow "upwards" with higher clock rates and faster memory access but have been growing "outwards" with massively distributed and parallel resources. For example, Stanford's Center for Turbulence Research has recently used 1.97 × 10 6 cores with approximately 1.6 petabytes of memory to find numerical solutions of the compressible Navier-Stokes equations [6]. We discuss the scalability of these kinds of computation systems more in Section III-A, but it is clear that this parallel computational paradigm calls for different types of algorithms than are used in the more traditional serial computational paradigm.…”

Section: Introductionmentioning

confidence: 99%

Randomized iterations for low latency fixed point computation

Master

Bambos

2014

53rd IEEE Conference on Decision and Control

View full text Add to dashboard Cite

Many algorithms in numerical analysis, operations research, and control theory can be written as fixed point iterations of contraction maps. Because these algorithms are so pervasive, there is significant value in being able to perform these computations with lower computational latency. Previous work has focused on reducing the computational complexity of certain algorithms, but there has been less of a focus on using massively parallel computing systems to reduce computational latency. The rise of parallel computing systems makes this focus increasingly relevant as many traditional algorithms are incapable of fully utilizing such large-scale parallel processing power. We propose a randomized parallel algorithm which computes the fixed point of an arbitrary contraction map on R n while making full use of large scale computing resources. When the number of processors grows exponentially with n, the proposed algorithm allows for a linear reduction in latency. Though this can also be said of a naive "brute-force" algorithm, the proposed algorithm is characterized by a much better linear factor. A numerical example is used to demonstrate this latency reduction while analytical proofs show this improvement holds in general. We conclude by discussing potential future work in specializing the proposed algorithm for specific applications as well in building a more general theory.

show abstract

Solving the compressible navier-stokes equations on up to 1.97 million cores and 4.1 trillion grid points

Cited by 41 publications

References 15 publications

High-fidelity simulation of an ultrasonic standing-wave thermoacoustic engine with bulk viscosity effects

High-fidelity simulation of an ultrasonic standing-wave thermoacoustic engine with bulk viscosity effects

Accelerating solutions of one-dimensional unsteady PDEs with GPU-based swept time–space decomposition

Randomized iterations for low latency fixed point computation

Contact Info

Product

Resources

About