2022
DOI: 10.1007/978-3-030-95953-1_4
|View full text |Cite
|
Sign up to set email alerts
|

Concurrent Execution of Deferred OpenMP Target Tasks with Hidden Helper Threads

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 17 publications
(6 citation statements)
references
References 5 publications
0
6
0
Order By: Relevance
“…In recent work we implemented loop transformation constructions introduced in OpenMP 5.1 [70,71], asynchronous offloading for OpenMP [132], efficient lowering of idiomatic OpenMP code to GPUs (under review), OpenMP-aware compiler optimizations with informative and actionable remarks for users (under review), a portable OpenMP device (=gpu) runtime written in OpenMP 5.1 (including atomic 2) partial( 4) partial( 8) partial( 16) partial (32) partial (64) partial (128) partial( 256 support) [133], a virtual GPU as debugging friendly offloading target on the host [134], improved diagnostics and execution information [135,136]. We redone the OpenMP GPU code generation in LLVM/Clang [137] to improve performance and correctness.…”
Section: Recent Progressmentioning
confidence: 99%
“…In recent work we implemented loop transformation constructions introduced in OpenMP 5.1 [70,71], asynchronous offloading for OpenMP [132], efficient lowering of idiomatic OpenMP code to GPUs (under review), OpenMP-aware compiler optimizations with informative and actionable remarks for users (under review), a portable OpenMP device (=gpu) runtime written in OpenMP 5.1 (including atomic 2) partial( 4) partial( 8) partial( 16) partial (32) partial (64) partial (128) partial( 256 support) [133], a virtual GPU as debugging friendly offloading target on the host [134], improved diagnostics and execution information [135,136]. We redone the OpenMP GPU code generation in LLVM/Clang [137] to improve performance and correctness.…”
Section: Recent Progressmentioning
confidence: 99%
“…Such a mechanism would allow the threads to dispatch many target regions concurrently, even letting a single OpenMP thread manage an "infinite" number of target regions, thus resolving the problem not only for OMPC but for all target devices. In fact, this limitation has already been pointed out by the libomptarget developers [33], but has not been entirely fixed yet.…”
Section: Future Workmentioning
confidence: 99%

The OpenMP Cluster Programming Model

Yviquel,
Pereira,
Francesquini
et al. 2022
Preprint
“…In [25], several approaches are presented to overlap GPU operations with computations thanks to OpenMP target constructions. They proposed to run asynchronous target tasks within dedicated threads, which are preempted by blocking operations.…”
Section: Related Workmentioning
confidence: 99%
“…The completion of GPU operations implies synchronizations that end up blocking threads. Hence, the LLVM OpenMP runtime executes asynchronous target tasks on dedicated Hidden Helper Threads (HHT) [25] implemented as kernel threads. Thus, the operating system can preempt threads blocking on GPU operations, and Standard OpenMP threads can be rescheduled onto physical cores to progress other tasks in parallel.…”
Section: Openmp Target In Mpcmentioning
confidence: 99%
See 1 more Smart Citation