Tuning parallel applications in parallel

Tiwari, Ananta; Tabatabaee, Vahid; Hollingsworth, Jeffrey K.

doi:10.1016/j.parco.2009.07.001

Cited by 13 publications

(8 citation statements)

References 66 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this work multiple local searches are used to find the estimate of the global optimum independently. This is in contrast to the most popular hybrid global optimisation methods, where local search is used only to refine x * in the regions suggested by the initial stage of the global search [24][25][26]. While a hybrid scheme can reduce the required number of starting points and thus the number of evaluations of , it necessarily introduces latency because the choice of new starting points is made after x * has been evaluated from some previous starting points.…”

Section: Modelling and Simulation In Engineeringmentioning

confidence: 99%

Autotuning of Isotropic Hardening Constitutive Models on Real Steel Buckling Data with Finite Element Based Multistart Global Optimisation on Parallel Computers

Shterenlikht

Kashani

Alexander

et al. 2017

Modelling and Simulation in Engineering

View full text Add to dashboard Cite

An automatic framework for tuning plastic constitutive models is proposed. It is based on multistart global optimisation method, where the objective function is provided by the results of multiple elastoplastic finite element analyses, executed concurrently. Wrapper scripts were developed for fully automatic preprocessing, including model and mesh generation, analysis, and postprocessing. The framework is applied to an isotropic power hardening plasticity using real load/displacement data from multiple steel buckling tests. M. J. D. Powell’s BOBYQA constrained optimisation package was used for local optimisation. It is shown that using the real data presents multiple problems to the optimisation process because (1) the objective function can be discontinuous, yet (2) relatively flat around multiple local minima, with (3) similar values of the objective function for different local minima. As a consequence the estimate of the global minimum is sensitive to the amount of experimental data and experimental noise. The framework includes the verification step, where the estimate of the global minimum is verified on a different geometry and loading. A tensile test was used for verification in this work. The speed of the method critically depends on the ability to effectively parallelise the finite element solver. Three levels of parallelisation were exploited in this work. The ultimate limitation was the availability of the finite element commercial solver license tokens.

show abstract

Section: Modelling and Simulation In Engineeringmentioning

confidence: 99%

Autotuning of Isotropic Hardening Constitutive Models on Real Steel Buckling Data with Finite Element Based Multistart Global Optimisation on Parallel Computers

Shterenlikht

Kashani

Alexander

et al. 2017

Modelling and Simulation in Engineering

View full text Add to dashboard Cite

show abstract

“…For example, MATE was able to scale up to 32 cores as shown in Caymes-Scutari et al 16 and Morajko et al, 17 while Active Harmony scaled up to 128 cores as described in Tiwari et al 18 …”

Section: Validation Of Proposed Model For Hierarchical Dynamic Tuningmentioning

confidence: 99%

“…Moreover, it is worth noticing the huge improvement that the ELASTIC approach represents in comparison to previous centralized dynamic tuning tools. For example, MATE was able to scale up to 32 cores as shown in Caymes‐Scutari et al and Morajko et al, while Active Harmony scaled up to 128 cores as described in Tiwari et al…”

Section: Experimental Assesmentmentioning

confidence: 99%

Evaluating a formal methodology for dynamic tuning of large‐scale parallel applications

Martínez

Sikora

Eduardo

et al. 2017

Concurrency and Computation

View full text Add to dashboard Cite

SummaryLarge-scale parallel applications performance is usually far from the expected. Dynamic tuning is a powerful technique that helps to improve the performance of parallel applications. To bring this technique to large-scale computers, this work presents a model that enables decentralized dynamic tuning of large-scale parallel applications. In this model, applications are decomposed into disjoint subsets of tasks that can be tuned individually but also abstracted to obtain a global view of the parallel application. The proposed model has been designed as a hierarchical tuning network of distributed analysis modules and implemented in the form of ELASTIC, an environment for large-scale dynamic tuning. Using ELASTIC an experimental evaluation has been conducted over a synthetic large-scale parallel application and a real agent-based parallel application. The results show that the proposed model, embodied in ELASTIC, is able to scale to meet the demands of dynamic tuning over thousands of processes, while effectively improving the performance of large-scale applications. KEYWORDSdynamic tuning, performance analysis, performance tools, scalability, tuning network INTRODUCTIONParallel applications running on supercomputers are able to execute complex scientific parallel applications in a reasonable amount of time. Unfortunately, it is common that the performance expected of these large-scale parallel applications is not easily achieved. Several performance analysis tools, such as Scalasca 1 or TAU, 2 are able to assist developers in identifying the performance problems of these applications in large-scale contexts.However, most of these analysis tools are less useful when applications have execution behaviors that change depending on the input data set or according to data evolution.In this context, performance analysis tools based on automatic and dynamic tuning are necessary. In this approach the three phases of the performance improvement process (monitoring, analysis, and tuning) are performed automatically and continuously while the parallel application is running. However, dynamic tuning of parallel applications is a challenge in a large-scale context. Currently, the tools that offer dynamic tuning 3-6 follow a centralized scheme where a single module is responsible for the global tool control and the analysis and tuning process over the entire parallel application. When working with large-scale parallel applications, a scalability barrier arises from this centralized operation due to the large number of communication connections and the increasing complexity of conducting a holistic performance analysis and tuning.To address the challenge of tuning large-scale parallel applications at runtime, we have defined and designed a model that enables decentralized large-scale dynamic tuning. This is based on decomposing the parallel application in disjoint subsets of tasks that will be analyzed and tuned independently. In addition, an abstraction mechanism is applied on each of these subsets in order to build a sm...

show abstract

“…As Tiwari et al have noted, the losses incurred by evaluating several poor solutions can easily outweigh the benefits of discovering an excellent solution [12].…”

Section: Search Efficiencymentioning

confidence: 99%

Angel

Chen

Hollingsworth

2015

Proceedings of the 5th International Workshop on Runtime and Operating Systems for Supercomputers

Self Cite

View full text Add to dashboard Cite

The majority of auto-tuning research in HPC to date has focused on improving the performance of a single objective function. More recently, interest has grown in optimizing multiple objective functions simultaneously, for example balancing the trade-off between execution time and energy consumption. Evolutionary algorithms for multi-objective optimization attempt to quickly and accurately discover the set of Pareto-optimal solutions, or the Pareto frontier. However, a single solution (as opposed to a set) is desirable for online end-to-end auto-tuning. As an alternative, we use the hierarchical method based on prioritizing the objective functions and iteratively optimizing them in isolation to efficiently produce a single Pareto-optimal result. This approach has several advantages over evolutionary or even scalarization techniques for auto-tuning HPC problems. In this paper, we introduce the Automated Navigation Given Enumerated Leeways (ANGEL) auto-tuning method. We demonstrate the quality of ANGEL using a multi-objective benchmark test suite. We also demonstrate its utility by approximating various sections of a Pareto front from a real-world proxy application. ANGEL successfully navigates trade-offs that range from a 15% reduction in kernel run-time to a 3% savings in energy use.

show abstract

Tuning parallel applications in parallel

Cited by 13 publications

References 66 publications

Autotuning of Isotropic Hardening Constitutive Models on Real Steel Buckling Data with Finite Element Based Multistart Global Optimisation on Parallel Computers

Autotuning of Isotropic Hardening Constitutive Models on Real Steel Buckling Data with Finite Element Based Multistart Global Optimisation on Parallel Computers

Evaluating a formal methodology for dynamic tuning of large‐scale parallel applications

Angel

Contact Info

Product

Resources

About