Abstract:Our problem is to accurately solve linear systems on a general purpose graphics processing unit with double double and quad double arithmetic. The linear systems originate from the application of Newton's method on polynomial systems. Newton's method is applied as a corrector in a path tracking method, so the linear systems are solved in sequence and not simultaneously. One solution path may require the solution of thousands of linear systems. In previous work we reported good speedups with our implementation … Show more
“…In [43], we continued this line of investigation on the GPU, based on our GPU implementations of evaluation and differentiation algorithms [40], combining our GPU implementation of the the Gram-Schmidt orthogonalization method [41]. The computational results in [40] and [41] were on randomly generated data. The data in this paper comes from relevant polynomial systems, relevant to actual applications.…”
Section: Introductionmentioning
confidence: 99%
“…In particular, the cyclic n-root problems relate to the construction of complex Hadamard matrices [36] and the Pieri homotopies solve the output placement problem in linear systems control [10]. This paper reports on improvements and the integration of our building blocks (described in [40,41,43]) in an accelerated path tracker. Good speedups relative to the CPU are obtained on benchmark problems, sufficiently large enough to compensate for the computational overhead caused by the double double arithmetic.…”
Section: Introductionmentioning
confidence: 99%
“…In [39], we experimentally showed that the cost overhead of double double arithmetic is of a similar magnitude as the cost of complex double arithmetic and that eight CPU cores may suffice to offset this overhead in multithreaded implementations. In [43], we continued this line of investigation on the GPU, based on our GPU implementations of evaluation and differentiation algorithms [40], combining our GPU implementation of the the Gram-Schmidt orthogonalization method [41]. The computational results in [40] and [41] were on randomly generated data.…”
Numerical continuation methods track a solution path defined by a homotopy. The systems we consider are defined by polynomials in several variables with complex coefficients. For larger dimensions and degrees, the numerical conditioning worsens and hardware double precision becomes often insufficient to reach the end of the solution path. With double double and quad double arithmetic, we can solve larger problems that we could not solve with hardware double arithmetic, but at a higher computational cost. This cost overhead can be compensated by acceleration on a Graphics Processing Unit (GPU). We describe our implementation and report on computational results on benchmark polynomial systems.
“…In [43], we continued this line of investigation on the GPU, based on our GPU implementations of evaluation and differentiation algorithms [40], combining our GPU implementation of the the Gram-Schmidt orthogonalization method [41]. The computational results in [40] and [41] were on randomly generated data. The data in this paper comes from relevant polynomial systems, relevant to actual applications.…”
Section: Introductionmentioning
confidence: 99%
“…In particular, the cyclic n-root problems relate to the construction of complex Hadamard matrices [36] and the Pieri homotopies solve the output placement problem in linear systems control [10]. This paper reports on improvements and the integration of our building blocks (described in [40,41,43]) in an accelerated path tracker. Good speedups relative to the CPU are obtained on benchmark problems, sufficiently large enough to compensate for the computational overhead caused by the double double arithmetic.…”
Section: Introductionmentioning
confidence: 99%
“…In [39], we experimentally showed that the cost overhead of double double arithmetic is of a similar magnitude as the cost of complex double arithmetic and that eight CPU cores may suffice to offset this overhead in multithreaded implementations. In [43], we continued this line of investigation on the GPU, based on our GPU implementations of evaluation and differentiation algorithms [40], combining our GPU implementation of the the Gram-Schmidt orthogonalization method [41]. The computational results in [40] and [41] were on randomly generated data.…”
Numerical continuation methods track a solution path defined by a homotopy. The systems we consider are defined by polynomials in several variables with complex coefficients. For larger dimensions and degrees, the numerical conditioning worsens and hardware double precision becomes often insufficient to reach the end of the solution path. With double double and quad double arithmetic, we can solve larger problems that we could not solve with hardware double arithmetic, but at a higher computational cost. This cost overhead can be compensated by acceleration on a Graphics Processing Unit (GPU). We describe our implementation and report on computational results on benchmark polynomial systems.
“…The linear systems in each Newton step we solve in the least squares sense via the modified Gram-Schmidt method. In [24] and [25] our computations were executed on randomly generated regular data sets. In [27], we integrated and improved the evaluation and differentiation codes to run Newton's method on some selected benchmark polynomial systems.…”
Polynomial systems occur in many areas of science and engineering. Unlike general nonlinear systems, the algebraic structure enables to compute all solutions of a polynomial system. We describe our massive parallel predictor-corrector algorithms to track many solution paths of a polynomial homotopy. The data parallelism that provides the speedups stems from the evaluation and differentiation of the monomials in the same polynomial system at different data points, which are the points on the solution paths. Polynomial homotopies that have tens of thousands of solution paths can keep a sufficiently large amount of threads occupied. Our accelerated code combines the reverse mode of algorithmic differentiation with double double and quad double precision to compute more accurate results faster.
“…Because our computations are geared towards extended precision arithmetic which carry a higher cost per operation, we can afford a fine granularity in our parallel algorithms. Compared to our previous GPU implementations in [37,38], we have removed the restrictions on the dimensions and are now able to solve problems involving several thousands of variables. The performance investigation involves mixing the memory-bound polynomial evaluation and differentiation with the compute-bound linear system solving.…”
In order to compensate for the higher cost of double double and quad double arithmetic when solving large polynomial systems, we investigate the application of NVIDIA Tesla K20C general purpose graphics processing unit. The focus on this paper is on Newton's method, which requires the evaluation of the polynomials, their derivatives, and the solution of a linear system to compute the update to the current approximation for the solution. The reverse mode of algorithmic differentiation for a product of variables is rewritten in a binary tree fashion so all threads in a block can collaborate in the computation. For double arithmetic, the evaluation and differentiation problem is memory bound, whereas for complex quad double arithmetic the problem is compute bound. With acceleration we can double the dimension and get results that are twice as accurate in about the same time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.