2010
DOI: 10.1007/978-3-642-11515-8_10
|View full text |Cite
|
Sign up to set email alerts
|

Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
161
0
1

Year Published

2012
2012
2018
2018

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 197 publications
(162 citation statements)
references
References 4 publications
0
161
0
1
Order By: Relevance
“…Note that the definition of the first operator depends on the storage format used for the convective operator. An example for the sliced ELLPACK format is shown in [21].…”
Section: Governing Equations and Numerical Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Note that the definition of the first operator depends on the storage format used for the convective operator. An example for the sliced ELLPACK format is shown in [21].…”
Section: Governing Equations and Numerical Methodsmentioning
confidence: 99%
“…It consists in sorting the rows by the number of entries and then divide the matrix into slices which are themselves stored using the ELLPACK format. More details can be found in [18,21]. A performance comparison between the CSR and the sELL in our application context is presented in the next section.…”
Section: Intra-device Optimizationmentioning
confidence: 99%
“…Monakov et al [5] put forward a sliced ELL format and used auto-tuning to find the optimal configuration for batter performance. Zheng and Gu [6] proposed bisection ELL (BiELL) and bisection JAD (BiJAD) format based on ELL and JAD format for optimizing SpMV on GPUs.…”
Section: Introductionmentioning
confidence: 99%
“…In the resulting sliced ELLPACK format (SELL or SELL-C where C denotes the size of the row blocks [123,127]), the overhead is no longer determined by the matrix row containing the largest number of nonzeros, but by the row with the largest number of nonzero elements in the respective block.…”
Section: Graphics Acceleratorsmentioning
confidence: 99%
“…, en el que se incorporan versiones optimizadas del CG, principalmente centradas en la operación SpMV sobre las mismas arquitecturas. En concreto, para las arquitecturas multinúcleo se utilizaron los formatos CSR, BCSR y CSB [192,193,194,195,196], mientras que para las GPUs se utilizaron los formatos ELLPACK, ELLR_T y SELL-P [197,198,199,200,201], y también se incluyó el "fusionado de kernels CUDA". Además, en el estudio se utilizó aritmética de DP, aunque como complemento final, también se comprobó el uso de SP para la GPU (Kepler) y un procesador de propósito general (Intel Bridge).…”
Section: Análisis De Arquitecturas Paralelasunclassified