Abstract:Achieving high scalability with dynamically adaptive algorithms in high-performance computing (HPC) is a non-trivial task. The invasive paradigm using compute migration represents an efficient alternative to classical data migration approaches for such algorithms in HPC. We present a coredistribution scheduler which realizes the migration of computational power by distributing the cores depending on the requirements specified by one or more parallel program instances. We validate our approach with different be… Show more
“…We see that we can, at little loss of efficiency, for many setups reduce the number of used cores. For codes deploying multiple MPI ranks per node, other ranks then can grab these freed cores [9].…”
With the advent of manycore systems, shared memory parallelisation has gained importance in high performance computing. Once a code is decomposed into tasks or parallel regions, it becomes crucial to identify reasonable grain sizes, i.e. minimum problem sizes per task that make the algorithm expose a high concurrency at low overhead. Many papers do not detail what reasonable task sizes are, and consider their findings craftsmanship not worth discussion. We have implemented an autotuning algorithm, a machine learning approach, for a project developing a hyperbolic equation system solver. Autotuning here is important as the grid and task workload are multifaceted and change frequently during runtime. In this paper, we summarise our lessons learned. We infer tweaks and idioms for general autotuning algorithms and we clarify that such a approach does not free users completely from grain size awareness.
“…We see that we can, at little loss of efficiency, for many setups reduce the number of used cores. For codes deploying multiple MPI ranks per node, other ranks then can grab these freed cores [9].…”
With the advent of manycore systems, shared memory parallelisation has gained importance in high performance computing. Once a code is decomposed into tasks or parallel regions, it becomes crucial to identify reasonable grain sizes, i.e. minimum problem sizes per task that make the algorithm expose a high concurrency at low overhead. Many papers do not detail what reasonable task sizes are, and consider their findings craftsmanship not worth discussion. We have implemented an autotuning algorithm, a machine learning approach, for a project developing a hyperbolic equation system solver. Autotuning here is important as the grid and task workload are multifaceted and change frequently during runtime. In this paper, we summarise our lessons learned. We infer tweaks and idioms for general autotuning algorithms and we clarify that such a approach does not free users completely from grain size awareness.
The current static job scheduling on supercomputers for MPIbased applications is well known to be a limiting factor for the exploitation of a system's top performance in terms of application throughput. Hence, allowing fully flexible and dynamically varying job sizes would provide multiple advantages compared to the current approach, e.g., by prioritizing jobs dynamically and optimizing resource usage by transferring resources economically. A critical step in achieving dynamic resource management with MPI on supercomputers is the development of sound and robust interfaces between MPI applications and the runtime system. Our approach extends the concept of MPI Sessions, a new concept introduced with MPI 4.0, by adding new features to support varying computing resources via the MPI process set abstraction. We then show how these features can be used, as a proof of concept, to request (active) and cope with (passive) varying resources from an application's perspective. To validate of our approach, we develop libmpidynres, a C library providing an emulated MPI Sessions environment on top of existing MPI implementations without MPI Sessions support, which we then use to integrate our proposed extensions to the interface specification. Using this proof-of-concept environment, we show how an MPI Sessions enabled application can use process sets to handle dynamically varying resources.
“…If a job exceeds the specified time limit then usually the job is cancelled from the job scheduling system. 3 A different interesting approach to manage resources is the field of invasive computing [32], where a job can request and release resources dynamically while it is running. This helps to share resources while executing many jobs in parallel.…”
Section: Idling With Standard Scheduling Techniquesmentioning
To foster predictive simulations, a variety of methods have recently been developed to efficiently tackle uncertainty quantification (UQ) in complex, computational intensive problems. Many of these methods are non-intrusive and, thus, result in a (large) number of embarrassingly parallel black-box evaluations of the underlying simulation codes. While the focus of development is typically on the number of black-box evaluations, which represents the bulk of the computational workload, an additional level of potential performance gains exists. In many scenarios, uncertain input leads not only to uncertain outputs, but also to a varying and thus stochastic runtime of the simulation codes. For scheduling the individual black-box runs, this information is typically not taken into account, resulting in non-negligible idling times on parallel systems. In this contribution, we compare a variety of different scheduling strategies for non-intrusive UQ scenarios using the non-intrusive polynomial chaos approach. In particular, we propose to construct a surrogate model for the runtime of the application using the identical UQ methodology as for the original problem. Using this model to predict the runtimes for subsequent black-box runs allows for (heuristical) optimization of the scheduling. The method has been tested for the forward quantification of uncertainty on academic models and on a pedestrian simulation in the context of evacuation scenarios. This approach allows speed-up factors of about two for the total runtime and can be generalised to a large variety of applications that incorporate parameter-dependent runtime.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.