We describe the main features and discuss the tuning of algorithms for the direct solution of sparse linear systems on distributed memory computers developed in the context of PARASOL ESPRIT IV LTR Project No 20160. The algorithms use a multifrontal approach and are especially designed to cover a large class of problems. The problems can be symmetric positive de nite, general symmetric, or unsymmetric matrices, all possibly rank de cient, and they can be provided by the user in several formats. The algorithms achieve high performance by exploiting parallelism coming from the sparsity in the problem and that available for dense matrices. The algorithms use a dynamic distributed task scheduling technique to accommodate numerical pivoting and to allow the migration of computational tasks to lightly loaded processors. Large computational tasks are divided into subtasks to enhance parallelism. Asynchronous communication is used throughout the solution process for the e cient o verlap of communication and computation. We illustrate our design choices by experimental results obtained on a Cray SGI Origin 2000 and an IBM SP2 for test matrices provided by industrial partners in the PARASOL project.
In this paper, we consider the problem of designing a dynamic scheduling strategy that takes into account both workload and memory information in the context of the parallel multifrontal factorization. The originality of our approach is that we base our estimations (work and memory) on a static optimistic scenario during the analysis phase. This scenario is then used during the factorization phase to constrain the dynamic decisions. The task scheduler has been redesigned to take into account these new features. Moreover performance have been improved because the new constraints allow the new scheduler to make optimal decisions that were forbidden or too dangerous in unconstrained formulations. Performance analysis show that the memory estimation becomes much closer to the memory effectively used and that even in a constrained memory environment we decrease the factorization time with respect to the initial approach.Key-words: sparse matrices, parallel multifrontal method, dynamic scheduling, memory.This text is also available as a research report of the Laboratoire de l'Informatique du Parallélisme http://www.ens-lyon.fr/LIP and as a technical report from ENSEEIHT-IRIT. Stratégies d'ordonnancement hybrides pour la résolution parallèle de systèmes linéairesRésumé : Nous proposons des stratégies d'ordonnancement bi-critères, qui s'inté-ressentà la foisà la performance età la consommation mémoire d'un algorithme parallèle de factorisation de matrices creuses, basé sur la méthode multifrontale. L'originalité de notre approche est que nous basons nos estimations mémoire sur un scénario optimiste (simulation lors de la phase d'analyse), qui est ensuite utilisé lors de la factorisation pour contraindre les décisions dynamiques d'ordonnancement. Un nouvel ordonnanceur aété implanté, qui prend en compte ces nouvelles contraintes. De plus, la performance aété améliorée parce que notre nouvelle approche permet a l'ordonnanceur de prendre des décisions meilleures, quiétaient interdites ou trop dangereuses auparavant. Une analyse de performance montre que les estimations mémoire sont beaucoup plus proches de la mémoire effectivement utilisée, et que le temps de factorisation est amélioré de façon significative par rapportà l'approche initiale.
We consider the solution of both symmetric and unsymmetric systems of sparse linear equations. A new parallel distributed memory multifrontal approach is described. To handle numerical pivoting e ciently, a parallel asynchronous algorithm with dynamic scheduling of the computing tasks has been developed. We discuss some of the main algorithmic choices and compare both implementation issues and the performance of the LDL T and LU factorizations. Performance analysis on an IBM SP2 shows the e ciency and the potential of the method. The test problems used are from the Rutherford-Boeing collection and from the PARASOL end users.
International audienceWe present a finite-difference frequency-domain method for 3D visco-acoustic wave propagation modeling. In the frequency domain, the underlying numerical problem is the resolution of a large sparse system of linear equations whose right-hand side term is the source. This system is solved with a massively parallel direct solver. We first present an optimal 3D finite-difference stencil for frequency-domain modeling. The method is based on a parsimonious staggered-grid method. Differential operators are discretized with second-order accurate staggered-grid stencils on different rotated coordinate systems to mitigate numerical anisotropy. An antilumped mass strategy is implemented to minimize numerical dispersion. The stencil incorporates 27 grid points and spans two grid intervals. Dispersion analysis shows that four grid points per wavelength provide accurate simulations in the 3D domain. To assess the feasibility of the method for frequency-domain full-waveform inversion, we computed simulations in the 3D SEG/EAGE overthrust model for frequencies 5, 7, and 10 Hz. Results confirm the huge memory requirement of the factorization (several hundred Figabytes) but also the CPU efficiency of the resolution phase (few seconds per shot). Heuristic scalability analysis suggests that the memory complexity of the factorization is O(35N(4)) for a N-3 grid. Our method may provide a suitable tool to perform frequency-domain full-waveform inversion using a large distributed-memory platform. Further investigation is still necessary to assess more quantitatively the respective merits and drawbacks of time- and frequency-domain modeling of wave propagation to perform 3D full-waveform inversion
Matrices coming from elliptic Partial Differential Equations have been shown to have a low-rank property which can be efficiently exploited in multifrontal solvers to provide a substantial reduction of their complexity. Among the possible low-rank formats, the Block Low-Rank format (BLR) is easy to use in a general purpose multifrontal solver and its potential compared to standard (full-rank) solvers has been demonstrated. Recently, new variants have been introduced and it was proved that they can further reduce the complexity but their performance has never been analyzed. In this paper, we present a multithreaded BLR factorization, and analyze its efficiency and scalability in shared-memory multicore environments. We identify the challenges posed by the use of BLR approximations in multifrontal solvers and put forward several algorithmic variants of the BLR factorization that overcome these challenges by improving its efficiency and scalability. We illustrate the performance analysis of the BLR multifrontal factorization with numerical experiments on a large set of problems coming from a variety of real-life applications.Additional Key Words and Phrases: sparse linear algebra, multifrontal factorization, Block Low-Rank, multicore architectures ACM Reference Format:
Abstract. Matrices coming from elliptic Partial Differential Equations have been shown to have a lowrank property: well defined off-diagonal blocks of their Schur complements can be approximated by low-rank products and this property can be efficiently exploited in multifrontal solvers to provide a substantial reduction of their complexity. Among the possible low-rank formats, the Block Low-Rank format (BLR) is easy to use in a general purpose multifrontal solver and has been shown to provide significant gains compared to full-rank on practical applications. However, unlike hierarchical formats, such as H and HSS, its theoretical complexity was unknown. In this paper, we extend the theoretical work done on hierarchical matrices in order to compute the theoretical complexity of the BLR multifrontal factorization. We then present several variants of the BLR multifrontal factorization, depending on the strategies used to perform the updates in the frontal matrices and on the constraints on how numerical pivoting is handled. We show how these variants can further reduce the complexity of the factorization. In the best case (3D, constant ranks), we obtain a complexity of the order of O(n 4/3 ). We provide an experimental study with numerical results to support our complexity bounds.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.