2020
DOI: 10.1109/access.2019.2959905
|View full text |Cite
|
Sign up to set email alerts
|

A Hierarchical Data-Partitioning Algorithm for Performance Optimization of Data-Parallel Applications on Heterogeneous Multi-Accelerator NUMA Nodes

Abstract: Modern HPC platforms are highly heterogeneous with tight integration of multicore CPUs and accelerators (such as Graphics Processing Units, Intel Xeon Phis, or Field-Programmable Gate Arrays) empowering them to address the twin critical concerns of performance and energy efficiency. Due to this inherent characteristic, processing elements contend for shared on-chip resources such as Last Level Cache (LLC), interconnect, etc. and shared nodal resources such as DRAM, PCI-E links, etc., resulting in complexities … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 43 publications
0
4
0
Order By: Relevance
“…Most of the previous works do not address the problem of minimizing communication cost, which is an important related concept. Modern HPC systems have become more parallel and heterogeneous to accommodate the growing demands of performance and energy efficiency [25]. The programming community is facing a great challenge in respect of how to minimize data movement, which is the most dominant factor in performance and energy consumption [6], [8], [26].…”
Section: Discussionmentioning
confidence: 99%
“…Most of the previous works do not address the problem of minimizing communication cost, which is an important related concept. Modern HPC systems have become more parallel and heterogeneous to accommodate the growing demands of performance and energy efficiency [25]. The programming community is facing a great challenge in respect of how to minimize data movement, which is the most dominant factor in performance and energy consumption [6], [8], [26].…”
Section: Discussionmentioning
confidence: 99%
“…Further work has demonstrated reconigurable accelerators that rely on ield programmable gate arrays (FPGAs) [40,70] or ASICs [81]. Consequently, past work has examined how job scheduling should consider heterogeneous resource requests [8,30], how the operating system (OS) and runtime should adapt [42,57], how to write applications for heterogeneous systems [8,32], how to partition data-parallel applications onto heterogeneous compute resources [48], how to consider the diferent fault tolerances of heterogeneous resources [41], how to fairly compare the performance of diferent heterogeneous systems [44], and what the impact of heterogeneous resources is to application performance [52,74,80].…”
Section: Background and Related Work 21 Resource Heterogeneity In Hpcmentioning
confidence: 99%
“…The schedulers dynamically distribute the processing tasks among all the resources to fully exploit all computing devices while minimizing load unbalance [67]. Amdahl's Law aims at maximizing power-normalized performance by considering the dynamic workload variations and its classification [68], [69], deploying data partitioning algorithms to minimize the parallel execution time of data-parallel applications on clusters of identical nodes of heterogeneous processors [70] or scheduling strategies and task decomposition to accelerate the training process of large-scale CNNs by minimizing the waiting time for critical paths [71].…”
Section: B Implementation Challenges Of Ai On Socmentioning
confidence: 99%