The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2020
DOI: 10.1007/978-3-030-64616-5_4
|View full text |Cite
|
Sign up to set email alerts
|

Multiple-Precision BLAS Library for Graphics Processing Units

Abstract: The binary32 and binary64 floating-point formats provide good performance on current hardware, but also introduce a rounding error in almost every arithmetic operation. Consequently, the accumulation of rounding errors in large computations can cause accuracy issues. One way to prevent these issues is to use multiple-precision floating-point arithmetic. This paper presents a new library of basic linear algebra operations with multiple precision for graphics processing units. The library is written in CUDA C/C+… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 22 publications
0
2
0
Order By: Relevance
“…We directly use data supported by work [57] as the GPU results. The work [58] proposes a new multiple precision algorithm that is faster than CAMPARY when requiring greater than 106 bits of precision. However, in this work, all libraries are tested based on a single-precision version on the NVIDIA GTX architecture, which is much slower than the double-precision version.…”
Section: Comparisons With Other Architecturesmentioning
confidence: 99%
“…We directly use data supported by work [57] as the GPU results. The work [58] proposes a new multiple precision algorithm that is faster than CAMPARY when requiring greater than 106 bits of precision. However, in this work, all libraries are tested based on a single-precision version on the NVIDIA GTX architecture, which is much slower than the double-precision version.…”
Section: Comparisons With Other Architecturesmentioning
confidence: 99%
“…The authors show up to 19× speedup on a Fermi-based Tesla C2075 GPU over a consumer-grade quad-core Sandy Bridge CPU running MPFR, dropping to ∼1× for 424-bit mantissas. MPRES-BLAS [37] presents GPU acceleration of APFP dense linear algebra, showing ∼2× speedup over CAMPARY for GEMM, reporting ∼100-120 MOp/s for 424-bit precision on a GTX 1080 GPU. Lei et al [38] implement an APFP accelerator on a Virtex 6 FPGA and report 11.6× speedup for 1024-bit multiplication over MPFR running on a dual-core Core i3 530 Clarkdalebased CPU.…”
Section: Related Workmentioning
confidence: 99%