2020
DOI: 10.1007/978-3-030-57675-2_39
|View full text |Cite
|
Sign up to set email alerts
|

SYCL-Bench: A Versatile Cross-Platform Benchmark Suite for Heterogeneous Computing

Abstract: The SYCL standard promises to enable high productivity in heterogeneous programming of a broad range of parallel devices, including multicore CPUs, GPUs, and FPGAs. Its modern and expressive C++ API design, as well as flexible task graph execution model give rise to ample optimization opportunities at run-time, such as the overlapping of data transfers and kernel execution. However, it is not clear which of the existing SYCL implementations perform such scheduling optimizations, and to what extent. Furthermore… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 16 publications
(8 citation statements)
references
References 16 publications
0
8
0
Order By: Relevance
“…On CPU architectures, we find the challenges of NDRange parallelism in library-only SYCL implementations identified by Lal et al [4], as seen in Figures 2a and 2b. With compiler support, these problems are alleviated as shown in Figure 2c.…”
Section: A Dgemmmentioning
confidence: 68%
See 2 more Smart Citations
“…On CPU architectures, we find the challenges of NDRange parallelism in library-only SYCL implementations identified by Lal et al [4], as seen in Figures 2a and 2b. With compiler support, these problems are alleviated as shown in Figure 2c.…”
Section: A Dgemmmentioning
confidence: 68%
“…This can add considerable overhead and limit vectorization across work-items in library-only SYCL implementations if explicit barriers or collective group algorithms are used (see e.g. [4] for details). However, mapping nd_range parallelism to e.g.…”
Section: Expressing Parallelism In Syclmentioning
confidence: 99%
See 1 more Smart Citation
“…SYCL-Bench also integrates fifteen kernels/applications from Polybench. This suite also has the possibility to execute on different SYCL implementations, such as DPC++, ComputeCpp, triSYCL, and AdaptiveCpp [31].…”
Section: Memory Liberationmentioning
confidence: 99%
“…The Atom CPU with the AdaptiveCpp compiler obtains the worst performance because SYCL code is translated to OpenMP, while DPC++ is conducted by the OpenCL backend. This point makes the difference between both implementations [31,50]. When comparing DPC++ performance on both CPU and GPU (UHD Graphics), it's noteworthy that the Atom processor even outperforms the GPU.…”
Section: Polybench Experimentsmentioning
confidence: 99%