2020
DOI: 10.25209/2079-3316-2020-11-3-61-84
|View full text |Cite
|
Sign up to set email alerts
|

Multiple-precision matrix-vector multiplication on graphics processing units

Abstract: We are considering a parallel implementation of matrix-vector multiplication (GEMV, Level 2 of the BLAS) for graphics processing units (GPUs) using multiple-precision arithmetic based on the residue number system. In our GEMV implementation, element-wise operations with multiple-precision vectors and matrices consist of several parts, each of which is calculated by a separate CUDA kernel. This feature eliminates branch divergence when performing sequential parts of multiple-precision operations and allows the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 24 publications
0
5
0
Order By: Relevance
“…The double double arithmetic of CAMPARY performs best for the problem of matrix-vector multiplication. In quad double precision, the authors of [9] write "the CAMPARY library is faster than our implementation; however as the precision increases the execution time of CAMPARY also increases significantly. "…”
Section: On Alternatives To Camparymentioning
confidence: 94%
See 1 more Smart Citation
“…The double double arithmetic of CAMPARY performs best for the problem of matrix-vector multiplication. In quad double precision, the authors of [9] write "the CAMPARY library is faster than our implementation; however as the precision increases the execution time of CAMPARY also increases significantly. "…”
Section: On Alternatives To Camparymentioning
confidence: 94%
“…The authors of [9] compare CAMPARY and CUMP [18] to their GPU implementation of multiprecision arithmetic based on the multiple residue number system. The double double arithmetic of CAMPARY performs best for the problem of matrix-vector multiplication.…”
Section: On Alternatives To Camparymentioning
confidence: 99%
“…The other drawback is the limited size of the exponents (limited to the 11 bits of the 64-bit hardware doubles), which will prohibit the computation with infinitesimal values. In the context of GPU acceleration, recent work of [16] makes an interesting comparison with double double arithmetic: "The double double arithmetic of CAMPARY performs best for the problem of matrix-vector multiplication." Concerning quad double precision, the authors of [16] write: "the CAMPARY library is faster than our implementation; however as the precision increases the execution time of CAMPARY also increases significantly."…”
Section: Multiprecision Arithmeticmentioning
confidence: 99%
“…In the context of GPU acceleration, recent work of [16] makes an interesting comparison with double double arithmetic: "The double double arithmetic of CAMPARY performs best for the problem of matrix-vector multiplication." Concerning quad double precision, the authors of [16] write: "the CAMPARY library is faster than our implementation; however as the precision increases the execution time of CAMPARY also increases significantly." The advantage of multiple double arithmetic is that simple counts of the number of floating-point operations quantify the cost overhead precisely and the flops metrics for performance are directly applicable.…”
Section: Multiprecision Arithmeticmentioning
confidence: 99%
“…[11][12][13][14] Because all data information can be expressed in vector units, the matrix computation of data processing can be performed in a highly parallel dot-product manner as vector-matrix multiplication (VMM), which has distinct advantages and is now being developed mainly as an accelerator for inference, especially in neural network systems. [15][16][17][18][19][20][21] In addition, it has the potential to be a reconfigurable analog processor for signal processing, as each variable matrix element can be directly encoded in a matrix array to enable individual input signals to be transformed in a VMM manner. [3][4][5][6] Among big data-driven information applications, controlling traffic flow, especially in urban road networks and in conjunction with autonomous driving technology, is becoming a promising field.…”
Section: Introductionmentioning
confidence: 99%