Proceedings of Workshops of HPC Asia 2018
DOI: 10.1145/3176364.3176374
|View full text |Cite
|
Sign up to set email alerts
|

OpenMP-based parallel implementation of matrix-matrix multiplication on the intel knights landing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
3
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 9 publications
0
3
0
Order By: Relevance
“…CPU/GPU heterogeneous parallel programming model is based on a heterogeneous computing platform where computing power involving both GPUs and CPUs is considered [52]. OpenMP supports multi-threaded concurrent execution of tasks on multi-core CPUs [53]. The independence of CPU cores allows different tasks to be performed simultaneously among different OpenMP threads.…”
Section: Cpu/gpu Heterogeneous Computing 231 Gpu Parallel Architecturementioning
confidence: 99%
“…CPU/GPU heterogeneous parallel programming model is based on a heterogeneous computing platform where computing power involving both GPUs and CPUs is considered [52]. OpenMP supports multi-threaded concurrent execution of tasks on multi-core CPUs [53]. The independence of CPU cores allows different tasks to be performed simultaneously among different OpenMP threads.…”
Section: Cpu/gpu Heterogeneous Computing 231 Gpu Parallel Architecturementioning
confidence: 99%
“…Jiang et al [16] propose a three-level blocking DGEMM algorithm to improve data-locality in the Sunway TaihuLight supercomputer. Lim et al [19] optimize a DGEMM OpenMP fork-join version by choosing the proper block size and thread affinity to the Intel Xeon Phi. Abdelfattah et al [4] propose HGEMM to improve the performance in GPU Tensor Cores.…”
Section: Related Workmentioning
confidence: 99%
“…For nodes that support hyperthreading, the granularity modifier specifies whether to pin OpenMP threads to physical cores (granularity=core) or logical cores (granu-larity=fine). Using granularity=thread enables distribution of OpenMP threads in a compact and or scatter fashion [26]. For this work KMP_AFFINITY = granularity = fine was used as it prevented Matlab/Octave from over-allocating OpenMP threads to the same processor core as determined by monitoring the compute node with the Linux htop command during execution.…”
Section: Openmpmentioning
confidence: 99%