2021
DOI: 10.1007/978-3-030-85665-6_27
|View full text |Cite
|
Sign up to set email alerts
|

Efficient GPU Computation Using Task Graph Parallelism

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 20 publications
(2 citation statements)
references
References 24 publications
0
2
0
Order By: Relevance
“…The parallel computing community has a number of algorithms including static mapping [41], dynamic work-stealing [20], [21], asymptotic profiling [42], and other system-defined strategies [5], [8], [10], [16]. Vendor-specific features such as CUDA Graph [2], [43] and SYCL [9] offer asynchronous graph scheduling for task parallelism but implementation details are unknown. On the other hand, automatic GPU placement has been studied in machine learning community [44], [45].…”
Section: B Vlsi Placementmentioning
confidence: 99%
“…The parallel computing community has a number of algorithms including static mapping [41], dynamic work-stealing [20], [21], asymptotic profiling [42], and other system-defined strategies [5], [8], [10], [16]. Vendor-specific features such as CUDA Graph [2], [43] and SYCL [9] offer asynchronous graph scheduling for task parallelism but implementation details are unknown. On the other hand, automatic GPU placement has been studied in machine learning community [44], [45].…”
Section: B Vlsi Placementmentioning
confidence: 99%
“…Legion [3,9] is also a task-based runtime designed for distributed machines with heterogeneous nodes. More recently, CudaGraph [14] enables developers to write or capture GPU operations only and organizes them into graphs to reduce kernel launch overheads of CUDA. OpenMP [4,18] target constructs have been introduced in the specification version 4.0.…”
Section: Related Workmentioning
confidence: 99%