Thomas Guignon scite author profile

We present a way to improve the performance of the electronic structure Vienna Ab initio Simulation Package (VASP) program. We show that high-performance computers equipped with graphics processing units (GPUs) as accelerators may reduce drastically the computation time when offloading these sections to the graphic chips. The procedure consists of (i) profiling the performance of the code to isolate the time-consuming parts, (ii) rewriting these so that the algorithms become better-suited for the chosen graphic accelerator, and (iii) optimizing memory traffic between the host computer and the GPU accelerator. We chose to accelerate VASP with NVIDIA GPU using CUDA. We compare the GPU and original versions of VASP by evaluating the Davidson and RMM-DIIS algorithms on chemical systems of up to 1100 atoms. In these tests, the total time is reduced by a factor between 3 and 8 when running on n (CPU core + GPU) compared to n CPU cores only, without any accuracy loss.

show abstract

Will GPGPUs be Finally a Credible Solution for Industrial Reservoir Simulators?

Anciaux–Sedrakian

Eaton

Gratien

et al. 2015

View full text Add to dashboard Cite

Scalability and Load-Balancing Problems in Parallel Reservoir Simulation

Gratien¹,

Guignon²,

Magras³

et al. 2007

View full text Add to dashboard Cite

fax 01-972-952-9435. AbstractNew parallel reservoir simulator software designed for Linux clusters enable to overcome hardware limitation and to simulate models with large amount of data. Reservoir engineering industry is very interested in using ever growing dataset with more and more complex physics and detailed models. The key issue still remains running simulations in an acceptable CPU time. As, the trend in hardware technologies is not to improve drastically the performance of individual CPUs but to facilitate the aggregation of computation facilities (with high bandwidth network, multi-core architectures ...), the challenge is to improve the efficiency of reservoir simulation software on a large number of processors.New numerical difficulties and performance problems appear when the number of cells and the number of processors are growing. As a matter of fact, the architecture of Linux clusters is very sensible to memory distribution and load balancing: • the cost of parallel solver algorithm is usually sensible to the size of the reservoir model (lack of scalability) and the consequences on CPU performance can no more be neglected; • the domain decomposition algorithms used to distribute data between processors have a great influence on the computing load balancing between processors; • using adaptive numerical schemes with dynamic space criteria (AIM schemes, flash algorithms based on the thermodynamic state of each cell) is a source of unbalance that cannot statically be resolved; • simulation result storages on irregular data structures, such as unstructured grids, multilateral smart wells and perforated cells, lead to store an important amount of information during the simulation. With the variety of IO subsystems found on Linux clusters the simulator must be able to adapt its IO strategy to the underlying IO library/file system and hardware.In this paper, we present different approaches to overcome these kinds of problems. We discuss technical choices such like:• advanced scalable linear solver algorithm ;• load balancing issue with different domain decomposition strategies ; • dynamic space criteria, mesh partitioner strategy and parallel solver performance management; • flexible IO strategy from simple file system to more complex parallel file system or database.We have developed and benchmarked these different solutions on published reference large scale problems and actual case studies with several tens millions of cells. We analyze the results and discuss the efficiency of each solution to overcome the scalability difficulties and performance limitations due to load unbalance. IntroductionImproving robustness and performances of parallel reservoir simulators on new high performance computing architectures still remains a key issue to deal with the ever growing complexity and size of reservoir models. The simulator discussed in this paper is a multi purpose parallel reservoir simulator which implements the physical options necessary to sophisticated reservoir engineering such as black-oil, multi-c...

show abstract

Survey on Efficient Linear Solvers for Porous Media Flow Models on Recent Hardware Architectures

Anciaux–Sedrakian

Gottschling

Gratien

et al. 2014

Oil Gas Sci. Technol. – Rev. IFP Energies nouvelles

View full text Add to dashboard Cite

Re´sume´-Revue des algorithmes de solveurs line´aires utilise´s en simulation de re´servoir, efficaces sur les architectures mate´rielles modernes -Depuis quelques anne´es, en calculs haute performance les constructeurs ont recours de plus en plus a`des architectures base´es sur des unite´s de calculs multicoeurs e´ventuellement acce´le´re´es avec des cartes de type GPGPU (General Purpose Processing on Graphics Processing Units). L'intereˆt de telles architectures offrant un grand nombre d'unite´s de calcul pourrait eˆtre grand pour le domaine de la simulation d'e´coulements multiphasiques en milieu poreux, utilise´e par exemple dans les applications de type se´questration ge´ologique du CO 2 ou simulateur de re´cupe´ration avance´e de pe´trole dans des re´servoirs. Il faut ne´anmoins ve´rifier si les algorithmes des logiciels actuels sont adapte´s pour eˆtre efficaces avec ces nouvelles technologies. La re´solution de grands syste`mes line´aires creux constitue souvent la partie la plus couˆteuse des simulateurs d'e´coulement en milieu poreux. En effet, ces syste`mes sont souvent mal conditionne´s duˆau caracte`re souvent tre`s he´te´roge`ne et anisotrope des donne´es ge´ologiques. Les solveurs line´aires constituent pour ces raisons un point crucial pour les performances de ces simulateurs. Dans cet article, nous proposons un panorama des diffe´rents algorithmes de solveurs line´aires et de pre´conditionneurs utilise´s dans nos applications. Nous analysons leur efficacite´nume´rique et leur performance en fonction de diffe´rentes configurations mate´rielles. Nous proposons une nouvelle approche, base´e sur la programmation hybride, performante sur des architectures he´te´roge`nes a`base de processeurs multicoeurs ou d'acce´le´rateurs de type GPGPU. Cette approche est valide´e dans l'imple´mentation d'un BiCGStab pre´conditionneá vec des algorithmes de type ILU(0), BSSOR, pre´conditionneur polynomial ou CPR-AMG. Des tests de performances ont alors e´te´effectue´s sur differents cas d'e´tudes d'e´coulement en milieu poreux, utilisant des maillages de grande taille.Abstract -Survey on Efficient Linear Solvers for Porous Media Flow Models on Recent Hardware Architectures -In the past few years, High Performance Computing (HPC) technologies led to General Purpose Processing on Graphics Processing Units (GPGPU) and many-core architectures. These emerging technologies offer massive processing units and are interesting for porous media flow simulators may used for CO 2 geological sequestration or Enhanced Oil Recovery (EOR) simulation. However the crucial point is "are current algorithms and software able to use these new technologies efficiently?" The resolution of large sparse linear systems, almost ill-conditioned, constitutes the most CPUconsuming part of such simulators. This paper proposes a survey on various solver and preconditioner

show abstract

Combining reduction with synchronization barrier on multi‐core processors

Aboul-Karim

Giraud

Guermouche

et al. 2022

Concurrency and Computation

View full text Add to dashboard Cite

Summary With the rise of multi‐core processors with a large number of cores, the need for shared memory reduction that performs efficiently on a large number of cores is more pressing. Efficient shared memory reduction on these multi‐core processors will help share memory programs be more efficient. In this article, we propose a reduction method combined with a barrier method that uses SIMD read/write instructions to combine barrier signaling and reduction value to minimize memory/cache traffic between cores, thereby reducing barrier latency. We compare different barriers and reduction methods on three multi‐core processors and show that the proposed combining barrier/reduction methods are 4 and 3.5 times faster than respectively GCC 11.1 and Intel 21.2 OpenMP 4.5 reduction.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Thomas Guignon

Accelerating VASP electronic structure calculations using graphic processing units

Will GPGPUs be Finally a Credible Solution for Industrial Reservoir Simulators?

Scalability and Load-Balancing Problems in Parallel Reservoir Simulation

Survey on Efficient Linear Solvers for Porous Media Flow Models on Recent Hardware Architectures

Combining reduction with synchronization barrier on multi‐core processors

Contact Info

Product

Resources

About