2021
DOI: 10.48550/arxiv.2105.01976
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

GRAPHOPT: constrained-optimization-based parallelization of irregular graphs

Nimish Shah,
Wannes Meert,
Marian Verhelst

Abstract: Sparse, irregular graphs show up in various applications like linear algebra, machine learning, engineering simulations, robotic control, etc. These graphs have a high degree of parallelism, but their execution on parallel threads of modern platforms remains challenging due to the irregular data dependencies. The parallel execution performance can be improved by efficiently partitioning the graphs such that the communication and thread synchronization overheads are minimized without hurting the utilization of … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(7 citation statements)
references
References 43 publications
0
7
0
Order By: Relevance
“…2(a) shows a common approach for accelerating DAGs, in which a DAG is partitioned into smaller subgraphs that can be executed on parallel cores, e.g, in a multicore CPU. This approach is used in works like [39], [44], [46]. The cores have to synchronize occasionally for race-free communication that happens through a shared memory structure like an L3 cache in CPUs or a shared scratchpad in [46].…”
Section: A Making Irregular Data Accesses Predictablementioning
confidence: 99%
See 1 more Smart Citation
“…2(a) shows a common approach for accelerating DAGs, in which a DAG is partitioned into smaller subgraphs that can be executed on parallel cores, e.g, in a multicore CPU. This approach is used in works like [39], [44], [46]. The cores have to synchronize occasionally for race-free communication that happens through a shared memory structure like an L3 cache in CPUs or a shared scratchpad in [46].…”
Section: A Making Irregular Data Accesses Predictablementioning
confidence: 99%
“…• Due to the irregular structure, parallelizing different parts of DAGs across multiple units (like CPU cores, GPU streaming multiprocessors, etc.) demands high communication and synchronization overhead, limiting the parallelization benefits [44].…”
Section: Introductionmentioning
confidence: 99%
“…DPU uses an advanced graph-partitioning technique to reduce the number of synchronization barriers, as explained in [10]. The DAGs are partitioned into superlayers (fig.…”
Section: A Background Of Dag Executionmentioning
confidence: 99%
“…The CUs communicate via a global The detailed architecture of CU is explained later in §IV. A specialized compiler [10] is designed that takes an arbitrary DAG and generates the superlayers, schedules operations, performs memory allocation, etc. for DPU execution.…”
Section: A Compute Units (Cu)mentioning
confidence: 99%
See 1 more Smart Citation