2012
DOI: 10.1007/978-3-642-31464-3_67
|View full text |Cite
|
Sign up to set email alerts
|

Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures Using Tree Reduction

Abstract: Abstract. The objective of this paper is to enhance the parallelism of the tile bidiagonal transformation using tree reduction on multicore architectures. First introduced by Ltaief et. al [LAPACK Working Note #247, 2011], the bidiagonal transformation using tile algorithms with a two-stage approach has shown very promising results on square matrices. However, for tall and skinny matrices, the inherent problem of processing the panel in a domino-like fashion generates unnecessary sequential tasks. By using tre… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
19
0

Year Published

2012
2012
2019
2019

Publication Types

Select...
3
2
2

Relationship

3
4

Authors

Journals

citations
Cited by 13 publications
(19 citation statements)
references
References 21 publications
0
19
0
Order By: Relevance
“…Mixed precisions techniques are also used for further approximating contributions from farther particles in the context of fast multipole methods as well as LQCD computation [13], resulting in speeding up the computation while reducing memory traffic. Moreover, tree reduction techniques were naturally identified as a way to increase concurrency while reducing data motion on multicore architecture [14]- [16]. In this work, we present detailed power analysis of these mixed precision and tree reduction codes.…”
Section: Related Workmentioning
confidence: 99%
“…Mixed precisions techniques are also used for further approximating contributions from farther particles in the context of fast multipole methods as well as LQCD computation [13], resulting in speeding up the computation while reducing memory traffic. Moreover, tree reduction techniques were naturally identified as a way to increase concurrency while reducing data motion on multicore architecture [14]- [16]. In this work, we present detailed power analysis of these mixed precision and tree reduction codes.…”
Section: Related Workmentioning
confidence: 99%
“…The two-stage approach was applied to the TRD (Triangular Reduction) [34] and to SVD [35,53,54] in combination with tile algorithms and runtime scheduling based on data dependences between tasks that operate on the tiles. This resulted in very good performance but has never been used to compute the singular vectors.…”
Section: Related Workmentioning
confidence: 99%
“…The caveat is that the reductions can be done easily to a band form, instead of the proper bi-diagonal matrix or a tri-diagonal matrix (with a single subdiagonal). The solution is to reduce to the band form first, and then produce the proper form through the process of bulge chasing, i.e., successive elimination of the subdiagonal entries by a series of Householder transformations [34,35,[52][53][54][55]. Because both the reduction to the band form and the bulge chasing process can be implemented in a parallel and cache-efficient manner, the two-stage procedure is an order of magnitude faster than the legacy approach of LAPACK, which relies heavily on Level 2 BLAS operations, is memory bound, and therefore inefficient.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations