Orthogonalization on a General Purpose Graphics Processing Unit with Double Double and Quad Double Arithmetic

Proceedings of the 2015 International Workshop on Parallel Symbolic Computation

2015

Self Cite

Numerical continuation methods track a solution path defined by a homotopy. The systems we consider are defined by polynomials in several variables with complex coefficients. For larger dimensions and degrees, the numerical conditioning worsens and hardware double precision becomes often insufficient to reach the end of the solution path. With double double and quad double arithmetic, we can solve larger problems that we could not solve with hardware double arithmetic, but at a higher computational cost. This cost overhead can be compensated by acceleration on a Graphics Processing Unit (GPU). We describe our implementation and report on computational results on benchmark polynomial systems.

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Accelerating polynomial homotopy continuation on a graphics processing unit with double double and quad double arithmetic

Proceedings of the 2015 International Workshop on Parallel Symbolic Computation

2015

Self Cite

“…The linear systems in each Newton step we solve in the least squares sense via the modified Gram-Schmidt method. In [24] and [25] our computations were executed on randomly generated regular data sets. In [27], we integrated and improved the evaluation and differentiation codes to run Newton's method on some selected benchmark polynomial systems.…”

Section: Introductionmentioning

confidence: 99%

Tracking Many Solution Paths of a Polynomial Homotopy on a Graphics Processing Unit in Double Double and Quad Double Arithmetic

2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium

2015

Self Cite

Polynomial systems occur in many areas of science and engineering. Unlike general nonlinear systems, the algebraic structure enables to compute all solutions of a polynomial system. We describe our massive parallel predictor-corrector algorithms to track many solution paths of a polynomial homotopy. The data parallelism that provides the speedups stems from the evaluation and differentiation of the monomials in the same polynomial system at different data points, which are the points on the solution paths. Polynomial homotopies that have tens of thousands of solution paths can keep a sufficiently large amount of threads occupied. Our accelerated code combines the reverse mode of algorithmic differentiation with double double and quad double precision to compute more accurate results faster.

“…Because our computations are geared towards extended precision arithmetic which carry a higher cost per operation, we can afford a fine granularity in our parallel algorithms. Compared to our previous GPU implementations in [37,38], we have removed the restrictions on the dimensions and are now able to solve problems involving several thousands of variables. The performance investigation involves mixing the memory-bound polynomial evaluation and differentiation with the compute-bound linear system solving.…”

Section: Introductionmentioning

confidence: 99%

GPU Acceleration of Newton's Method for Large Systems of Polynomial Equations in Double Double and Quad Double Arithmetic

2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security

2014

Self Cite

In order to compensate for the higher cost of double double and quad double arithmetic when solving large polynomial systems, we investigate the application of NVIDIA Tesla K20C general purpose graphics processing unit. The focus on this paper is on Newton's method, which requires the evaluation of the polynomials, their derivatives, and the solution of a linear system to compute the update to the current approximation for the solution. The reverse mode of algorithmic differentiation for a product of variables is rewritten in a binary tree fashion so all threads in a block can collaborate in the computation. For double arithmetic, the evaluation and differentiation problem is memory bound, whereas for complex quad double arithmetic the problem is compute bound. With acceleration we can double the dimension and get results that are twice as accurate in about the same time.