2020
DOI: 10.1007/s11227-020-03452-2
|View full text |Cite
|
Sign up to set email alerts
|

Automatic translation of data parallel programs for heterogeneous parallelism through OpenMP offloading

Abstract: including the URL of the record and the reason for the withdrawal request.Noname manuscript No.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 23 publications
(34 reference statements)
0
1
0
Order By: Relevance
“…Figure 4. Excerpt of DCU hardware information NVIDIA's CUDA computation is based on a 32-thread bandwidth thread bundle, which is generally expressed as Warp, and the OpenMP Offload computation for the GCN family of hardware on the DCU platform is based on a 64-thread width, called Wavefront [10] . Each CU has a SU (Scalar Unit), which is a processing unit shared by all threads in the wavefront for flow control and pointer computation, etc.…”
Section: Dcu Architecturementioning
confidence: 99%
“…Figure 4. Excerpt of DCU hardware information NVIDIA's CUDA computation is based on a 32-thread bandwidth thread bundle, which is generally expressed as Warp, and the OpenMP Offload computation for the GCN family of hardware on the DCU platform is based on a 64-thread width, called Wavefront [10] . Each CU has a SU (Scalar Unit), which is a processing unit shared by all threads in the wavefront for flow control and pointer computation, etc.…”
Section: Dcu Architecturementioning
confidence: 99%