2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis 2010
DOI: 10.1109/sc.2010.48
|View full text |Cite
|
Sign up to set email alerts
|

Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems

Abstract: Abstract-As tile linear algebra algorithms continue achieving high performance on shared-memory multicore architectures, it is a challenging task to make them scalable on distributed-memory multicore cluster machines. The main contribution of this paper is the extension to the distributed-memory environment of the previous work done by Hadri et al. on CommunicationAvoiding QR (CA-QR) factorizations for tall and skinny matrices (initially done on shared-memory multicore systems). The fine granularity of tile al… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
20
0

Year Published

2012
2012
2016
2016

Publication Types

Select...
5
2

Relationship

3
4

Authors

Journals

citations
Cited by 25 publications
(20 citation statements)
references
References 11 publications
0
20
0
Order By: Relevance
“…And the same apply equally to GPU-accelerated implementations [2] as well as the codes designed specifically for distributed memory clusters of multicore nodes [20].…”
Section: Related Work and Relevant Contributionsmentioning
confidence: 93%
“…And the same apply equally to GPU-accelerated implementations [2] as well as the codes designed specifically for distributed memory clusters of multicore nodes [20].…”
Section: Related Work and Relevant Contributionsmentioning
confidence: 93%
“…First, several local binary trees are applied in parallel, one within each node, and then a global binary tree is applied for the final reduction across nodes. Yet another implementation [19] also uses a hierarchical approach, and it also uses a 1D block distribution. The main difference is that the first level of reduction is performed with a flat tree within each node.…”
Section: Related Workmentioning
confidence: 99%
“…The main difference is that the first level of reduction is performed with a flat tree within each node. Note that the hierarchical algorithm (HQR) used previously [5] can be parametrized to implement this original algorithm [19] as well as a more efficient variant with cyclic layout. The HQR algorithm [5] is the reference algorithm for multilevel clusters: it provides a flexible approach, and allows one to use various elimination trees (Flat, Binary, Fibonacci or Greedy) at each level.…”
Section: Related Workmentioning
confidence: 99%
“…This section briefly introduces the previous work of distributed tiled CAQR [9] and its existing performance bottleneck.…”
Section: Introductionmentioning
confidence: 99%