Part 5: Industrial Management and Other ApplicationsInternational audienceIn the future, the silicon technology will continue to reduce following the Moore’s law. Device variability is going to increase due to a loss in controllability during silicon chip fabrication. Then, the mean time between failures is also going to decrease. The current methodologies based on error detection and thread re-execution (roll back) can not be enough, when the number of errors increases and arrives to a specific threshold. This dynamic scenario can be very negative if we are executing programs in HPC systems where a correct, accurate and time constrained solution is expected. The objective of this paper is to describe and analyse the needs and constraints of different applications studied in disaster management processes. These applications fall mainly in the domains of the High Performance Computing (HPC). Even if this domain can have differences in terms of computation needs, system form factor and power consumption, it nevertheless shares some commonalities
An increasing number of High-Performance Applications demand some form of time predictability, in particular in scenarios where correctness depends on both performance and timing requirements, and the failure to meet either of them is critical. Consequently, a more predictable HPC system is required, particularly for an emerging class of adaptive real-time HPC applications. Here we present our runtime approach which produces the results in the predictable time with the minimized allocation of hardware resources. The paper describes the advantages in terms of execution time reliability and the trade-offs regarding power/energy consumption and temperature of the system compared with the current GNU/Linux governors
In this paper we present the ESPRESO FEM library, which includes a FEM toolbox with interfaces to professional and open-source simulation tools, and a massively parallel Hybrid Total FETI (HTFETI) solver which can fully utilize the OLCF Titan supercomputer, and achieves super-linear scaling. This paper presents several new techniques for FETI solvers designed for efficient utilization of supercomputers with a focus on: (i) performance-we present a fivefold reduction of solver runtime for the Laplace equation by redesigning the FETI solver, and offloading the key workload to the accelerator. We compare Intel Xeon Phi 7120p and Tesla K80 and P100 accelerators to Intel Xeon E5-2680v3 and Xeon Phi 7210 CPUs; and (ii) memory efficiency-we present two techniques which increase the efficiency of the HTFETI solver 1.8 times, and pushes the limits of the largest possible problem ESPRESO can solve from 124 to 223 billion unknowns for problems with unstructured meshes. Finally we show that by dynamicly tuning hardware parameters we can reduce energy consumption by up to 33 %.
For successful decision making in disaster management it is necessary to have very accurate information about disaster phenomena and its potential developmentin time. Rainfall-runoff simulations are an integral part of flood warning and decision making processes. To increase their accuracy, it is crucial to periodically updatetheir parametersin a calibration process.Since calibration is very time consuming process an HPC facility is convenient tool for its speed-up. However, required speed-up can be achieved only avoiding any human-computer interaction in so-called automatic calibration.In order to compare possibilities and efficiency of the automatic calibration, three different fully automatic parallel implementationstrategies were created and tested with our in-house rainfall-runoff model.
In this paper, we propose a safety-critical system with a run-time resource management that is used to operate an application for flood monitoring and prediction. This application can run with different Quality of Service (QoS) levels depending on the current hydrometeorological situation. The system operation can follow two main scenarios-standard or emergency operation. The standard operation is active when no disaster occurs, but the system still executes shortterm prediction simulations and monitors the state of the river discharge and precipitation intensity. Emergency operation is active when some emergency situation is detected or predicted by the simulations. The resource allocation can either be used for decreasing power consumption and minimizing needed resources in standard operation, or for increasing the precision and decreasing response times in emergency operation. This paper shows that it is possible to describe different optimal points at design time and use them to adapt to the current quality of service requirements during run-time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.