NAS Parallel Benchmarks with CUDA and beyond

Araujo, Gabriell; Griebler, Dalvan; Rockenbach, Dinei; Danelutto, Marco; Fernandes, Luiz Gustavo

doi:10.1002/spe.3056

Cited by 18 publications

(8 citation statements)

References 12 publications

(90 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…There is an ongoing effort to create SkePU implementations, and subsequently evaluations, of many benchmark workloads across several benchmark suites. Such suites include Rodinia [10], PARSEC [9] and its parallel derivate P3ARSEC [15], Poly-Bench [34], and NAS Parallel Benchmarks [5,8,27]. The complexity and effort required for benchmarking parallel programming models, interfaces, and frameworks is well-known [32] and examples of ongoing efforts to simplify and standardize parallel benchmark suites are many, including P3ARSEC and Task Bench.…”

Section: Benchmarksmentioning

confidence: 99%

“…The original version was written in Fortran and the parallel implementations were in OpenMP and MPI. In recent years, an effort was made to provide parallel versions for C/C++ parallel programming frameworks on multicore systems [26,27] as well as heterogeneous parallel programming on GPUs [5,6,19].…”

Section: Nas Parallel Benchmarksmentioning

confidence: 99%

“…In addition to the lightweight STREAM workloads, we also considered NPB as a means to select additional possible evaluation points, since these are different computations and are considered standard benchmarks for HPC evaluation. Like STREAM, NPB has not been subject to SkePU parallelization before, and given the recent work on NPB implementations in both C++ parallel CPU frameworks and CUDA for Nvidia GPUs [5,27], we have good reference points for efficiency comparison against SkePU implementations. However, SkePUizing the entirety of NPB will be a future work, because the experience from this initial effort will indicate the viability of such a project.…”

Section: Benchmark Selection and Implementationmentioning

confidence: 99%

See 2 more Smart Citations

Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel Systems

Ernstsson

Griebler

Keßler

2022

Int J Parallel Prog

Self Cite

View full text Add to dashboard Cite

We analyze the performance portability of the skeleton-based, single-source multi-backend high-level programming framework SkePU across multiple different CPU–GPU heterogeneous systems. Thereby, we provide a systematic application efficiency characterization of SkePU-generated code in comparison to equivalent hand-written code in more low-level parallel programming models such as OpenMP and CUDA. For this purpose, we contribute ports of the STREAM benchmark suite and of a part of the NAS Parallel Benchmark suite to SkePU. We show that for STREAM and the EP benchmark, SkePU regularly scores efficiency values above 80% and in particular for CPU systems, SkePU can outperform hand-written code.

show abstract

Section: Benchmarksmentioning

confidence: 99%

Section: Nas Parallel Benchmarksmentioning

confidence: 99%

Section: Benchmark Selection and Implementationmentioning

confidence: 99%

See 1 more Smart Citation

Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel Systems

Ernstsson

Griebler

Keßler

2022

Int J Parallel Prog

Self Cite

View full text Add to dashboard Cite

show abstract

“…No manual development and no special Field Programmable Gate Array (FPGA) or programming knowledge are required. The logic generated by this improved approach is up to 43 times faster than its hand-optimized High Level Synthesis (HLS) counterpart, depending on the solution method.The third paper titled "NAS Parallel Benchmarks with Compute Unified Device Architecture (CUDA) and Beyond" by Fernandes et al 3 provides a new CUDA implementation for NASA Parallel Benchmark (NPB). The performance results have shown up to 267% improvements over the best benchmark versions available.…”

mentioning

confidence: 99%

“…The third paper titled “NAS Parallel Benchmarks with Compute Unified Device Architecture (CUDA) and Beyond” by Fernandes et al 3 provides a new CUDA implementation for NASA Parallel Benchmark (NPB). The performance results have shown up to 267% improvements over the best benchmark versions available.…”

mentioning

confidence: 99%