OpenMP Thread Affinity for Matrix Factorization on Multicore Systems

Bylina, Beata; Bylina, Jarosław

doi:10.15439/2017f231

Cited by 6 publications

(9 citation statements)

References 4 publications

(4 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In many numerical algorithms where dependencies between data are very complicated, even such tools as efficient optimizing compilers are not able to transform the code to use the potential of modern processors. The authors of [3], [1], present algorithms for solving systems of equations, trying to improve their performance, in particular in parallel. Improvement in performance was obtained by appropriate transformation of the underlying algorithm using looping tiling and appropriate data structures.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Influence of loop transformations on performance and energy consumption of the multithreded WZ factorization

Bylina¹,

Bylina²,

Piekarz³

2022

Annals of Computer Science and Information Systems

Self Cite

View full text Add to dashboard Cite

High-level loop transformations are a key instrument to effectively exploit the resource in modern architectures. Energy consumption on multi-core architectures is one of the major issues connected with high-performance computing. We examine the impact of four loop transformation strategies on performance and energy consumption. The investigated strategies include: loop fission, loop interchange (permutation), stripmining, and loop tiling. Additionally, a column-wise and row-wise store formats for dense matrices are considered. Parallelization and vectorization are implemented using OpenMP directives. As a test, the WZ factorization algorithm is used. The comparison of selected strategies of the loop transformation is done for Intel architecture, namely Cascade Lake. It has been shown that for WZ factorization, which is an example of an application in which we can use the loop transformation, optimization towards highperformance can also be an effective strategy for improving energy efficiency. Our results show also that block size selection in loop tilling has a significant impact on energy consumption.

show abstract

Section: Related Workmentioning

confidence: 99%

“…However, to parallelize them efficiently, the programmer has to make some decisions about applying various transformations. An example of such loops is matrix algorithms, like matrix multiplication or different kinds of factorizations, widely investigated in the literature [3], [1].…”

Section: Introductionmentioning

confidence: 99%

Influence of loop transformations on performance and energy consumption of the multithreded WZ factorization

Bylina¹,

Bylina²,

Piekarz³

2022

Annals of Computer Science and Information Systems

Self Cite

View full text Add to dashboard Cite

show abstract

“…In this research, we control the thread affinity using the environment variable KMP AFFINITY on CPU and PHI KMP AFFINITY on MIC. We studied the OpenMP thread mapping strategies for matrix decompositions on multicore architectures in our work [3]. The results showed that the choice of scatter has the measurable impact on the executed time of the matrix factorisations on CPU.…”

Section: Thread Mappingmentioning

confidence: 99%

“…There is not a single thread mapping strategy that suits all the applications. We studied the OpenMP thread mapping strategies for matrix decompositions on multicore architectures in our work [3]. The results showed that the choice of thread affinity has the measurable impact on the executed time of the matrix factorisations.…”

mentioning

confidence: 99%

“…3 shows the performance of the LU factorisation in the function of the number of the threads for the matrix size of 19456 on Intel Xeon Phi in native mode and on the hybrid CPU-MIC platform in automatic offload mode for KMP AFFINITY=scatter on CPU and the different values for PHI KMP AFFINITY. We can seeAn Experimental Evaluation of the OpenMP Thread Mapping for Some Factorisations on Xeon Phi Coprocessor offload, MKL LU without piv., KMP_AFFINITY=scatter/scatter number of threads = 60 number of threads = 120 number of threads = 180 number of threads = offload, MKL LU without piv., KMP_AFFINITY=scatter/none number of threads = 60 number of threads = 120 number of threads = 180 number of threads = 240 The performance of the LU factorisation without pivoting (MKL library's implementation) in the automatic offload mode -for different matrix sizes, number of the threads, and the thread mapping settings.…”

mentioning

confidence: 99%

See 1 more Smart Citation

An Experimental Evaluation of the OpenMP Thread Mapping for LU Factorisation on Xeon Phi Coprocessor and on Hybrid CPU-MIC Platform

Bylina¹,

Bylina²

2018

SCPE

Self Cite

View full text Add to dashboard Cite

Efficient thread mapping relies upon matching the behaviour of the application with system characteristics. The main aim of this paper is to evaluate the influence of the OpenMP thread mapping on the computation performance of the matrix factorisations on Intel Xeon Phi coprocessor and hybrid CPU-MIC platforms. The authors consider parallel LU factorisations with and without pivoting, both from MKL (Math Kernel Library) library. The results show that the choice of thread affinity, the number of threads and the execution mode have a measurable impact on the performance and the scalability of the LU factorisations.

show abstract

Characterizing the Sharing Behavior of Applications Using Software Transactional Memory

Pasqualin

Diener

Bois

et al. 2021

Benchmarking, Measuring, and Optimizing

View full text Add to dashboard Cite

OpenMP Thread Affinity for Matrix Factorization on Multicore Systems

Cited by 6 publications

References 4 publications

Influence of loop transformations on performance and energy consumption of the multithreded WZ factorization

Influence of loop transformations on performance and energy consumption of the multithreded WZ factorization

An Experimental Evaluation of the OpenMP Thread Mapping for LU Factorisation on Xeon Phi Coprocessor and on Hybrid CPU-MIC Platform

Characterizing the Sharing Behavior of Applications Using Software Transactional Memory

Contact Info

Product

Resources

About