Abstract:The Preconditioned Conjugate Gradient method is often employed for the solution of linear systems of equations arising in numerical simulations of physical phenomena. While being widely used, the solver is also known for its lack of accuracy while computing the residual. In this article, we propose two algorithmic solutions that originate from the ExBLAS project to enhance the accuracy of the solver as well as to ensure its reproducibility in a hybrid MPI + OpenMP tasks programming environment. One is based on… Show more
“…Further, they realized the reproducibility of the pure MPI parallel Preconditioned BiCGSTAB algorithm on the CPU based on ExBLAS, use Jacobi preconditioner [18]. Furthermore, they have also achieved reproducibility in the MPI+OpenMP environment [19]. Mukunoki et al realizes the reproducibility of the CG solver on the CPU and GPU [10].…”
Krylov subspace algorithms are important methods for solving linear systems. In order to solve large-scale linear systems and speed up the solution of linear systems, one has to use parallelism techniques. However, parallelism often enlarge the non-associativity of floating-point operations. This can lead to non-reproducibility of the computations. This paper compares the performance of the parallel preconditioned BiCGSTAB algorithm implemented with two different libraries (ExBLAS and ReproBLAS) that can ensure reproducibility of the computations. To address the effect of the compiler, we explicitly utilize the fma instructions. Finally, numerical experiments show that the BiCGSTAB algorithms based on the two BLAS implementations are reproducible, the BiCGSTAB algorithm based on ExBLAS is more accurate but more time-consuming, and the BiCGSTAB algorithm based on ReproBLAS is relatively less accurate but less expensive.
“…Further, they realized the reproducibility of the pure MPI parallel Preconditioned BiCGSTAB algorithm on the CPU based on ExBLAS, use Jacobi preconditioner [18]. Furthermore, they have also achieved reproducibility in the MPI+OpenMP environment [19]. Mukunoki et al realizes the reproducibility of the CG solver on the CPU and GPU [10].…”
Krylov subspace algorithms are important methods for solving linear systems. In order to solve large-scale linear systems and speed up the solution of linear systems, one has to use parallelism techniques. However, parallelism often enlarge the non-associativity of floating-point operations. This can lead to non-reproducibility of the computations. This paper compares the performance of the parallel preconditioned BiCGSTAB algorithm implemented with two different libraries (ExBLAS and ReproBLAS) that can ensure reproducibility of the computations. To address the effect of the compiler, we explicitly utilize the fma instructions. Finally, numerical experiments show that the BiCGSTAB algorithms based on the two BLAS implementations are reproducible, the BiCGSTAB algorithm based on ExBLAS is more accurate but more time-consuming, and the BiCGSTAB algorithm based on ReproBLAS is relatively less accurate but less expensive.
“…The above ExBLAS approach has been extended to CG methods [8,9]. They implemented the CG solver with the Jacobi preconditioner on distributed environments using the pure MPI as well as MPI + OpenMP tasks.…”
Section: Related Workmentioning
confidence: 99%
“…However, it is not so large within 100 iterations. [8,9] based on the ExBLAS approach [10]. These CG solvers are parallelized with the flat MPI as well as MPI and OpenMP tasks but support only CPUs.…”
Section: Performance (Overhead)mentioning
confidence: 99%
“…The ExBLAS-based implementations have two versions: the MPI-OpenMP hybrid parallel [8] and the flat MPI version [9]. However, the flat MPI version was faster than the hybrid version on both CPU1 and CPU2.…”
Section: Comparison With Exblas-based Cgmentioning
confidence: 99%
“…However, the level of accuracy needed to ensure a certain level of reproducibility is problem (input) dependent. Iakymchuk et al [8,9] demonstrated the achievement of reproducibility with only an accurate computation method (only with FPE, i.e., the ExBLAS scheme without a long accumulator) by focusing on certain problems. It contributes more toward improving performance than the ExBLAS approach does.…”
Section: Reproducibility Without Correctly Rounded Operations and Acc...mentioning
On Krylov subspace methods such as the Conjugate Gradient (CG) method, the number of iterations until convergence may increase due to the loss of computational accuracy caused by rounding errors in floating-point computations. At the same time, because the order of the computation is nondeterministic on parallel computation, the result and the behavior of the convergence may be nonidentical in different computational environments, even for the same input. In this study, we present an accurate and reproducible implementation of the unpreconditioned CG method on x86 CPUs and NVIDIA GPUs. In our method, while all variables are stored on FP64, all inner product operations (including matrix-vector multiplications) are performed using the Ozaki scheme. The scheme delivers the correctly rounded computation as well as bit-level reproducibility among different computational environments. In this paper, we show some examples where the standard FP64 implementation of CG results in nonidentical results across different CPUs and GPUs. We then demonstrate the applicability and the effectiveness of our approach in terms of accuracy and reproducibility and their performance on both CPUs and GPUs. Furthermore, we compare the performance of our method against an existing accurate and reproducible CG implementation based on the Exact Basic Linear Algebra Subprograms (ExBLAS) on CPUs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.