2019
DOI: 10.2478/amcs-2019-0030
|View full text |Cite
|
Sign up to set email alerts
|

The Parallel Tiled WZ Factorization Algorithm for Multicore Architectures

Abstract: The aim of this paper is to investigate dense linear algebra algorithms on shared memory multicore architectures. The design and implementation of a parallel tiled WZ factorization algorithm which can fully exploit such architectures are presented. Three parallel implementations of the algorithm are studied. The first one relies only on exploiting multithreaded BLAS (basic linear algebra subprograms) operations. The second implementation, except for BLAS operations, employs the OpenMP standard to use the loop-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 20 publications
(28 reference statements)
0
5
0
Order By: Relevance
“…QIF is known for the adaptability of its direct method to solve systems of linear equations. Thus, the factorization gives rise to the use of Parallel implicit elimination (PIE) for the solution of linear system to simultaneously compute two matrix elements (two columns at a time) for parallel implementation, unlike Gaussian elimination (GE) which computes one column at a time [13]. The stability of QIF comes from the centro-nonsingular matrix (central submatrices are nonsingular) which is far reliable than any other type of factorization [8].…”
Section: Quadrant Interlocking Factorizationmentioning
confidence: 99%
See 1 more Smart Citation
“…QIF is known for the adaptability of its direct method to solve systems of linear equations. Thus, the factorization gives rise to the use of Parallel implicit elimination (PIE) for the solution of linear system to simultaneously compute two matrix elements (two columns at a time) for parallel implementation, unlike Gaussian elimination (GE) which computes one column at a time [13]. The stability of QIF comes from the centro-nonsingular matrix (central submatrices are nonsingular) which is far reliable than any other type of factorization [8].…”
Section: Quadrant Interlocking Factorizationmentioning
confidence: 99%
“…While LU factorization performs elimination in serial with n − 1 steps, W Z factorization executes components in parallel with n 2 steps if n is even or n−1 2 steps if n is odd. W Z factorization simultaneously computes two matrix elements (two columns at a time), unlike LU factorization which computes one column at a time [13,12,36]. Unlike W Z factorization, LU factorization is not unique but block LU factorization with higher diagonal blocks gives similar analytic result as W Z factorization [37].…”
Section: W Z Factorization Versus Lu Factorizationmentioning
confidence: 99%
“…erefore, the methods described in these references lack generality [31,[38][39][40][41]. As reported in [32,[42][43][44] the relevant features, such as the number of loop iterations of multilevel nested loops and the number of nested layers, were extracted from the intermediate representation of the compiler to construct the loop selection assessment model of multilevel nested loops. Compared with the models based on the thread-level speculation technique, these models improve the parallelization effect of multilevel nested loops to some extent.…”
Section: Related Workmentioning
confidence: 99%
“…Compared with the models based on the thread-level speculation technique, these models improve the parallelization effect of multilevel nested loops to some extent. However, as the number of nested loop levels of the program increases, the iteration dependency between loops becomes complicated, which makes them still unable to achieve the desired performance improvement [32,[42][43][44]. e authors in [45][46][47] proposed frameworks for misspeculation based on the loop cost in the compiler.…”
Section: Related Workmentioning
confidence: 99%
“…LU factorization is often known to be implemented in LAPACK library to exploit the standard software library architectures [17]. W Z factorization offers parallelization in solving both sparse and dense linear system to enhance performance using OpenMP, CUDA, BLAS or EDK HW/SW codesign architecture [1,14]. Then, Yalamov [42] presented that W Z factorization is faster on computer with a parallel architecture than any other matrix factorization methods.…”
Section: Introductionmentioning
confidence: 99%