Abstract:Distributing a simulation across many machines can drastically speed up computations and increase detail. The computing cloud provides tremendous computing resources, but weak service guarantees force programs to manage significant system complexity: nodes, networks, and storage occasionally perform poorly or fail.
We describe Nimbus, a system that automatically distributes grid-based and hybrid simulations across cloud computing nodes. The main simulation loop is sequential code and launches distrib… Show more
“…Parallel INSE solvers for multicore systems were developed using OpenMP [67] and extended to multinode cluster systems using MPI [68]. In computer graphics, Mashayekhi et al [69] proposed a system called "Nimbus", which automatically distributes grid-based and hybrid simulations across cloud computing nodes for faster execution at higher grid or particle resolutions.…”
Fluid simulations are often performed using the incompressible Navier-Stokes equations (INSE), leading to sparse linear systems which are difficult to solve efficiently in parallel. Recently, kinetic methods based on the adaptive-central-moment multiple-relaxation-time (ACM-MRT) model [1], [2] have demonstrated impressive capabilities to simulate both laminar and turbulent flows, with quality matching or surpassing that of state-of-the-art INSE solvers. Furthermore, due to its local formulation, this method presents the opportunity for highly scalable implementations on parallel systems such as GPUs. However, an efficient ACM-MRT-based kinetic solver needs to overcome a number of computational challenges, especially when dealing with complex solids inside the fluid domain. In this paper, we present multiple novel GPU optimization techniques to efficiently implement high-quality ACM-MRT-based kinetic fluid simulations in domains containing complex solids. Our techniques include a new communication-efficient data layout, a load-balanced immersed-boundary method, a multi-kernel launch method using a simplified formulation of ACM-MRT calculations to enable greater parallelism, and the integration of these techniques into a parametric cost model to enable automated parameter search to achieve optimal execution performance. We also extended our method to multi-GPU systems to enable large-scale simulations. To demonstrate the state-of-the-art performance and high visual quality of our solver, we present extensive experimental results and comparisons to other solvers.
“…Parallel INSE solvers for multicore systems were developed using OpenMP [67] and extended to multinode cluster systems using MPI [68]. In computer graphics, Mashayekhi et al [69] proposed a system called "Nimbus", which automatically distributes grid-based and hybrid simulations across cloud computing nodes for faster execution at higher grid or particle resolutions.…”
Fluid simulations are often performed using the incompressible Navier-Stokes equations (INSE), leading to sparse linear systems which are difficult to solve efficiently in parallel. Recently, kinetic methods based on the adaptive-central-moment multiple-relaxation-time (ACM-MRT) model [1], [2] have demonstrated impressive capabilities to simulate both laminar and turbulent flows, with quality matching or surpassing that of state-of-the-art INSE solvers. Furthermore, due to its local formulation, this method presents the opportunity for highly scalable implementations on parallel systems such as GPUs. However, an efficient ACM-MRT-based kinetic solver needs to overcome a number of computational challenges, especially when dealing with complex solids inside the fluid domain. In this paper, we present multiple novel GPU optimization techniques to efficiently implement high-quality ACM-MRT-based kinetic fluid simulations in domains containing complex solids. Our techniques include a new communication-efficient data layout, a load-balanced immersed-boundary method, a multi-kernel launch method using a simplified formulation of ACM-MRT calculations to enable greater parallelism, and the integration of these techniques into a parametric cost model to enable automated parameter search to achieve optimal execution performance. We also extended our method to multi-GPU systems to enable large-scale simulations. To demonstrate the state-of-the-art performance and high visual quality of our solver, we present extensive experimental results and comparisons to other solvers.
“…Although there has been research utilising multiple servers to distribute the task of solving physics based problems (e.g., [Mashayekhi et al 2018]), to the best of our knowledge there is no literature describing real-time interactive physics exploiting the addition of servers to gain scalability. The closest work to our research is that carried out to seek scalability in terms of player numbers in online gaming in the field of Distributed Virtual Environments (DVEs).…”
In this paper we propose a solution to delivering scalable real-time physics simulations. Although high performance computing simulations of physics related problems do exist, these are not real-time and do not model the real-time intricate interactions of rigid bodies for visual effect common in video games (favouring accuracy over real-time). As such, this paper presents the first approach to real-time delivery of scalable, commercial grade, video game quality physics. This is achieved by taking the physics engine out of the player's machine and deploying it across standard cloud based infrastructures. The simulation world is then divided into sections that are then allocated to servers. A server maintains the physics for all simulated objects in its section. Our contribution is the ability to maintain a scalable simulation by allowing object interaction across section boundaries using predictive migration techniques. We allow each object to project an aura that is used to determine object migration across servers to ensure seamless physics interactions between objects. The validity of our results is demonstrated through experimentation and benchmarking. Our approach allows player interaction at any point in real-time (influencing the simulation) in the same manner as any video game. We believe that this is the first successful demonstration of scalable real-time physics.CCS Concepts: • Software and its engineering → Interactive games; Cloud computing;
“…Running a simulation 10 times faster on 10 times more nodes costs the same but completes an order of magnitude faster. Recent work has shown that single‐threaded complex simulations can be automatically distributed to run on over a thousand cores in the cloud, drastically speeding up simulations and increasing their details [MSQ*18].…”
Graphical fluid simulations are CPU‐bound. Parallelizing simulations on hundreds of cores in the computing cloud would make them faster, but requires evenly balancing load across nodes. Good load balancing depends on manual decisions from experts, which are time‐consuming and error prone, or dynamic approaches that estimate and react to future load, which are non‐deterministic and hard to debug.
This paper proposes Birdshot scheduling, an automatic and purely static load balancing algorithm whose performance is close to expert decisions and reactive algorithms without their difficulty or complexity. Birdshot scheduling's key insight is to leverage the high‐latency, high‐throughput, full bisection bandwidth of cloud computing nodes. Birdshot scheduling splits the simulation domain into many micro‐partitions and statically assigns them to nodes randomly. Analytical results show that randomly assigned micro‐partitions balance load with high probability. The high‐throughput network easily handles the increased data transfers from micro‐partitions, and full bisection bandwidth allows random placement with no performance penalty. Overlapping the communications and computations of different micro‐partitions masks latency.
Experiments with particle‐level set, SPH, FLIP and explicit Eulerian methods show that Birdshot scheduling speeds up simulations by a factor of 2‐3, and can out‐perform reactive scheduling algorithms. Birdshot scheduling performs within 21% of state‐of‐the‐art dynamic methods that require running a second, parallel simulation. Unlike speculative algorithms, Birdshot scheduling is purely static: it requires no controller, runtime data collection, partition migration or support for these operations from the programmer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.