2008
DOI: 10.1109/ipdps.2008.4536485
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation and tuning of the Level 3 CUBLAS for graphics processors

Abstract: The increase in performance of the last generations of graphics processors (GPUs)

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
44
0
2

Year Published

2008
2008
2015
2015

Publication Types

Select...
4
3
2

Relationship

4
5

Authors

Journals

citations
Cited by 69 publications
(48 citation statements)
references
References 5 publications
2
44
0
2
Order By: Relevance
“…The CUBLAS library is distributed with CUDA, and it may not be the fastest implementation at a given time, but it gives an optimized performance [3].…”
Section: Algorithm 1 Calculate a Euclidean Distance Matrix With Matrimentioning
confidence: 99%
“…The CUBLAS library is distributed with CUDA, and it may not be the fastest implementation at a given time, but it gives an optimized performance [3].…”
Section: Algorithm 1 Calculate a Euclidean Distance Matrix With Matrimentioning
confidence: 99%
“…This process is still under investigation. Another method to hide some of the GPU overhead may involve a hybrid technique in which GPU and CPU operations are performed in parallel, such as that described by Barrachina et al [17].…”
Section: E Analysis Of Performance Improvementmentioning
confidence: 99%
“…As we have discussed before, if the matrices are in CPU memory one can use padding, e.g., as in [5]. We have to allocated a bigger dimension of matrix in GPU memory, put zeroes in the extra elements, then transfer the data from CPU to GPU and then call the Kernel.…”
Section: Performancementioning
confidence: 99%