2016 IEEE High Performance Extreme Computing Conference (HPEC) 2016
DOI: 10.1109/hpec.2016.7761591
|View full text |Cite
|
Sign up to set email alerts
|

LU, QR, and Cholesky factorizations: Programming model, performance analysis and optimization techniques for the Intel Knights Landing Xeon Phi

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2017
2017
2019
2019

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 12 publications
(7 citation statements)
references
References 18 publications
0
7
0
Order By: Relevance
“…Besides extending ideas from the batched linear algebra routines, manycore algorithms can also be built on ideas from the hybrid linear algebra algorithms. This was demonstrated for the case of KNL processors in [17]. The difficult-to-parallelize tasks are the panel factorizations (see Section 3), and these are the tasks offloaded for execution to the CPUs in the hybrid algorithms.…”
Section: Related Workmentioning
confidence: 94%
See 1 more Smart Citation
“…Besides extending ideas from the batched linear algebra routines, manycore algorithms can also be built on ideas from the hybrid linear algebra algorithms. This was demonstrated for the case of KNL processors in [17]. The difficult-to-parallelize tasks are the panel factorizations (see Section 3), and these are the tasks offloaded for execution to the CPUs in the hybrid algorithms.…”
Section: Related Workmentioning
confidence: 94%
“…The difficult-to-parallelize tasks are the panel factorizations (see Section 3), and these are the tasks offloaded for execution to the CPUs in the hybrid algorithms. As the KNL is self-hosted (i.e., there is no additional CPU host), a virtual CPU abstraction was created from a subset of the KNL cores that enabled hybrid algorithms to run efficiently on homogeneous manycore processors [17]. The panel factorizations can be done in parallel with the trailing matrix updates in factorizations like QR, LU, and Cholesky (see Section 3), which is used in the hybrid algorithms to overlap CPU work and CPU-to-GPU communications with GPU work on the trailing matrix updates.…”
Section: Related Workmentioning
confidence: 99%
“…We compare in Figure 8 the best configurations among the ones presented previously with the native implementation of the Cholesky factorization from the Intel MKL for the Intel KNL platform and with the PLASMA library. It is important to note that we were not able to compare our approach with the MAGMA library for Intel KNL architectures (see [25] for more details) because the corresponding software package is not yet available. For the sake of clarity, we only report the results obtained with the best setup for each library.…”
Section: Experimental Evaluation On the Intel Knl Platformmentioning
confidence: 99%
“…Their early results, as shown in their paper, achieved around 80 percentage efficiency in scalability results for caffe application. Haidar et al (2016) have studied the scalability aspects of algorithms such as lower upper (LU), QR, and cholesky factorisations. They proposed a programming model to efficiently utilise many core machines such as KNL.…”
Section: Related Workmentioning
confidence: 99%