Efficient SIMD solution of multiple systems of stiff IVPs

Kroshko, Andrew; Spiteri, Raymond J.

doi:10.1016/j.jocs.2012.08.017

Cited by 9 publications

(7 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Such methods are generally not as stable as fully implicit RK methods; however, the reduced stability, which manifests itself in terms of smaller stable step sizes, is often more than made up for by the reduced computational expense per step [Hairer and Wanner 1996;Sandu et al 1997]. Furthermore, the implementation of linearly implicit methods is typically much easier, less subject to tuning of convergence parameters, etc., and in some situations, due to their predictable number of operations per iteration, may be more amenable to parallelization [Kroshko and Spiteri 2013]. An IMEX ARK method can be used as a linearly implicit RK method by using the Jacobian J f (t, y(t)) of f(t, y(t)) to split the RHS into linear and nonlinear parts [Cooper and Sayfy 1983] as f [1] (t, y) = J f (t, y(t)) · y,…”

Section: Methodsmentioning

confidence: 99%

odeToJava

Kroshko

Spiteri

2015

ACM Trans. Math. Softw.

Self Cite

View full text Add to dashboard Cite

Problem-solving environments (PSEs) offer a powerful yet flexible and convenient means for general experimentation with computational methods, algorithm prototyping, and visualization and manipulation of data. Consequently, PSEs have become the modus operandi of many computational scientists and engineers. However, despite these positive aspects, PSEs typically do not offer the level of granularity required by the specialist or algorithm designer to conveniently modify the details. In other words, the level at which PSEs are black boxes is often still too high for someone interested in modifying an algorithm as opposed to trying an alternative.In this article, we describe odeToJava, a Java-based PSE for initial-value problems in ordinary differential equations. odeToJava implements explicit and linearly implicit implicit-explicit Runge-Kutta methods with error and stepsize control and intra-step interpolation (dense output), giving the user control and flexibility over the implementational aspects of these methods. We illustrate the usage and functionality of odeToJava by means of computational case studies of initial-value problems (IVPs).

show abstract

Section: Methodsmentioning

confidence: 99%

odeToJava

Kroshko

Spiteri

2015

ACM Trans. Math. Softw.

Self Cite

View full text Add to dashboard Cite

show abstract

“…The CPU thread issues explicit SIMD instructions to enact operations across the parallel lanes of the vector units. Kroshko and Spiteri [33] demonstrated this approach in their SIMD implementation of a RODAS Rosen-brock solver. There, they reported a speed-up of 1.89 × (i.e., 94 % parallel efficiency) when solving multiple systems of stiff IVPs on a cell broadband engine.…”

Section: Parallel Integrator Implementationsmentioning

confidence: 99%

Accelerating finite-rate chemical kinetics with coprocessors: Comparing vectorization methods on GPUs, MICs, and CPUs

Stone¹,

Alferman

Niemeyer

2018

Computer Physics Communications

View full text Add to dashboard Cite

Accurate and efficient methods for solving stiff ordinary differential equations (ODEs) are a critical component of turbulent combustion simulations with finite-rate chemistry. The ODEs governing the chemical kinetics at each mesh point are decoupled by operator-splitting allowing each to be solved concurrently. An efficient ODE solver must then take into account the available thread and instruction-level parallelism of the underlying hardware, especially on many-core coprocessors, as well as the numerical efficiency. A stiff Rosenbrock and a nonstiff Runge-Kutta ODE solver are both implemented using the single instruction, multiple thread (SIMT) and single instruction, multiple data (SIMD) paradigms within OpenCL. Both methods solve multiple ODEs concurrently within the same instruction stream. The performance of these parallel implementations was measured on three chemical kinetic models of increasing size across several multicore and many-core platforms. Two separate benchmarks were conducted to clearly determine any performance advantage offered by either method. The first measured the run-time of evaluating the righthand-side source terms in parallel and the second benchmark integrated a series of constant-pressure, homogeneous reactors using the Rosenbrock and Runge-Kutta solvers. The right-hand-side evaluations with SIMD parallelism on the host multicore Xeon CPU and many-core Xeon Phi co-processor performed approximately three times faster than the baseline multithreaded C ++ code. The SIMT parallel model on the host and Phi was 13 % to 35 % slower than the baseline while the SIMT model on the NVIDIA Kepler GPU provided approximately the same performance as the SIMD model on the Phi. The runtimes for both ODE solvers decreased significantly with the SIMD implementations on the host CPU (2.5-2.7 ×) and Xeon Phi coprocessor (4.7-4.9 ×) compared to the baseline parallel code. The SIMT implementations on the GPU ran 1.4-1.6 times faster than the baseline multithreaded CPU code; however, this was significantly slower than the SIMD versions on the host CPU or the Xeon Phi. The performance difference between the three platforms was attributed to thread divergence caused by the adaptive step-sizes within the ODE integrators. 1 arXiv:1608.05794v2 [physics.comp-ph] 29 Aug 2017Analysis showed that the wider vector width of the GPU incurs a higher level of divergence than the narrower Sandy Bridge or Xeon Phi. The significant performance improvement provided by the SIMD parallel strategy motivates further research into more ODE solver methods that are both SIMD-friendly and computationally efficient.

show abstract

“…Thus, every thread in a warp may have similar collapse strength and similar amount of slow down during the collapse phase. Such a "clustering" technique is already suggested by [90]. The number of registers required to avoid spilling in the Keller-Miksis test case is 184.…”

Section: Test Case: Pressure Relief Valve (Impact Dynamics)mentioning

confidence: 99%

Modular, general purpose ODE integration package to solve large number of independent ODE systems on GPUs

Hegedűs¹

2018

Preprint

View full text Add to dashboard Cite

A general purpose, modular program package for the integration of large number of independent ordinary differential equation systems capable of using professional graphics cards is presented. The available numerical schemes are the explicit and adaptive Runge-Kutta-Cash-Karp algorithm and the explicit fourth order Runge-Kutta method with fixed time step. In order to harness the huge processing power of graphics cards, the intermediate points of the computed trajectories are not stored. As a compensate, with pre-declared device functions, the required special features or properties of a solution can be easily extracted and stored each into a dedicated variable. For instance, the maximum and minimum values and/or their time instances. Event handling is also incorporated into the package in order to detect special points which can be stored as well. Moreover, again with pre-declared device function calls at such special points, the efficient handling of nonsmooth dynamics-e.g. impact dynamics-is possible. Several test cases are presented to demonstrate the flexibility of the pre-declared device functions and the strength of the program package. The applied models are the simple Duffing oscillator, the more complex Keller-Miksis equation known in bubble dynamics, and a system describing the behaviour of a pressure relief valve that can exhibit impact dynamics.

show abstract

Efficient SIMD solution of multiple systems of stiff IVPs

Cited by 9 publications

References 18 publications

odeToJava

odeToJava

Accelerating finite-rate chemical kinetics with coprocessors: Comparing vectorization methods on GPUs, MICs, and CPUs

Modular, general purpose ODE integration package to solve large number of independent ODE systems on GPUs

Contact Info

Product

Resources

About