2016
DOI: 10.1007/s10766-016-0441-6
|View full text |Cite
|
Sign up to set email alerts
|

Porting the PLASMA Numerical Library to the OpenMP Standard

Abstract: PLASMA is a numerical library intended as a successor to LAPACK for solving problems in dense linear algebra on multicore processors. PLASMA relies on the QUARK scheduler for efficient multithreading of algorithms expressed in a serial fashion. QUARK is a superscalar scheduler and implements automatic parallelization by tracking data dependencies and resolving data hazards at runtime. Recently, this type of scheduling has been incorporated in the OpenMP standard, which allows to transition PLASMA from the prop… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
28
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 31 publications
(28 citation statements)
references
References 31 publications
0
28
0
Order By: Relevance
“…However, now, the PLASMA library undergoes a process of porting from the QUARK task scheduler to the OpenMP task scheduler and that can change the PLASMA performance a little, but the stable version is still based on QUARK. On the other hand, in the work of Yarkhan et al (2017), we can see in Fig. 15 that the QUARK-based PLASMA implementation and its OpenMP version achieve almost identical performance-both somewhat worse than the MKL (on 20 cores of the Haswell processor, which is similar to our environment).…”
Section: Related Workmentioning
confidence: 60%
“…However, now, the PLASMA library undergoes a process of porting from the QUARK task scheduler to the OpenMP task scheduler and that can change the PLASMA performance a little, but the stable version is still based on QUARK. On the other hand, in the work of Yarkhan et al (2017), we can see in Fig. 15 that the QUARK-based PLASMA implementation and its OpenMP version achieve almost identical performance-both somewhat worse than the MKL (on 20 cores of the Haswell processor, which is similar to our environment).…”
Section: Related Workmentioning
confidence: 60%
“…Although in several cases the tasking model has replaced nested parallelism to exploit irregular applications [3,39], the latter still outperforms the former in some cases. This is, for example, the case of imbalanced loops, where dynamic scheduling or tasking may suffer from poor cache behavior and low data reuse due to the inability to bind tasks to cores [8].…”
Section: Nested Parallelism In Hpcmentioning
confidence: 99%
“…Both utilize the Cholesky decomposition to capture the mean and covariance of the system state. Overall, the GPA-aided SINU is a real-time application that can exploit two levels of parallelism: in the outer level, the computation of the two functionalities (i.e., computing position, velocity and orientation, and estimating errors) can be performed in parallel; in the inner level, the computation of the Cholesky decomposition used in the Kalman Filter [39] can be further parallelized. The use of nested parallel regions can however prevent the scheduler from fulfilling priorities or ensuring work-conserving executions.…”
Section: Gps-aided Sinumentioning
confidence: 99%
“…OpenMP 4.5 further extended the tasking capabilities. For example, OpenMP 4.5 added task priorities that are critical for obtaining high performance using some of our PLASMA routines [22]. These OpenMP standards are supported by popular compilers, including the GNU Compiler Collection (GCC) and the Intel C Compiler (ICC).…”
Section: Openmp Standardmentioning
confidence: 99%