High Performance Computing in Science and Engineering, Garching/Munich 2009 2010
DOI: 10.1007/978-3-642-13872-0_4
|View full text |Cite
|
Sign up to set email alerts
|

Fast 3D Block Parallelisation for the Matrix Multiplication Prefix Problem

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 8 publications
0
3
0
Order By: Relevance
“…Gradl et al [24] presented a parallel prefix algorithm for accumulation of matrix multiplications in quantum control. Waldherr et al [25] and Auckenthaler [26] showed later that prefix scan parallelization of this operation is outperformed by a sequential prefix scan with parallel matrix multiplication operator. These applications of prefix scan resulted in neither tuning nor designing a scan algorithm for operators where computation time is significantly larger than communication.…”
Section: Specific Prefix Scan Operatorsmentioning
confidence: 99%
“…Gradl et al [24] presented a parallel prefix algorithm for accumulation of matrix multiplications in quantum control. Waldherr et al [25] and Auckenthaler [26] showed later that prefix scan parallelization of this operation is outperformed by a sequential prefix scan with parallel matrix multiplication operator. These applications of prefix scan resulted in neither tuning nor designing a scan algorithm for operators where computation time is significantly larger than communication.…”
Section: Specific Prefix Scan Operatorsmentioning
confidence: 99%
“…Gradl et al [23] presented a parallel prefix algorithm for accumulation of matrix multiplications in quantum control. Waldherr et al [24] and Auckenthaler [25] showed later that prefix scan parallelization of this operation is outperformed by a sequential prefix scan with parallel matrix multiplication operator. These applications of prefix scan resulted in neither tuning nor designing a scan algorithm for operators where computation time is significantly larger than communication.…”
Section: Specific Prefix Scan Operatorsmentioning
confidence: 99%
“…While this reduces the wall clock time, the CPU time will be increased to handle the overheads of parallelisation. This will affect the observed speed-up in a way that depends upon matrix size [19] as matrix exponentiation is more easily parallelised than our efficient algorithm. The speed gain achievable will depend on both the spin system chosen and the precise code used, but our MATLAB simulations for the 3-spin homonuclear system corresponding to the three 13 C nuclei in alanine [20] indicate that the calculation of subpropagators can be sped up by a factor of around 18, while full propagators (which require an additional full matrix multiplication for each sub-propagator) are sped up by a factor of around 13.…”
Section: Errors and Speed Gainsmentioning
confidence: 99%