NAS Parallel Benchmarks with Python: a performance and programming effort analysis focusing on GPUs

Domenico, Daniel Di; Lima, João Vicente Ferreira; Cavalheiro, Gerson Geraldo H.

doi:10.1007/s11227-022-04932-3

Cited by 1 publication

(4 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The performance gap widened with increasing N and reached about 22x and 7.8x compared to Numba and CuPy for N = 2 × 10 9 . These findings are in line with those of previous studies (e.g., [39,41]).…”

Section: Resultssupporting

confidence: 94%

“…The generated PRNs are immediately consumed within the kernel without storing them in the global memory. We compared our results to our implementation in CUDA C. Note that the 1D MCRT test problem was implemented with 15, 26, and 37 lines of code in CuPy, Numba, and CUDA C, respectively, reflecting the ease of implementation in CuPy and Numba relative to CUDA C [41].…”

Section: Methodsmentioning

confidence: 99%

“…Many studies have explored how well GPUs perform in various applications [22][23][24][25]. These investigations compare the performance and highlight the strength and weaknesses of popular programming platforms such as CUDA C [26][27][28], CUDA Fortran [29][30][31], OpenCL [32][33][34], OpenACC [35,36], OpenMP [37,38] and Python-based compilers and libraries like Numba, CuPy, and Python CUDA [39][40][41][42][43][44]. Despite these efforts, a quest for a paradigm that offers simplicity of implementation and portability in combination with high performance remains one of the main goals of scientific computing (e.g., [45][46][47][48]).…”

Section: Introductionmentioning

confidence: 99%

“…There are a number of works that study these platforms in various contexts. Di Domenico et al [41] observed Numba achieving performance levels comparable to CUDAbased C++ using NAS parallel benchmark kernels. Oden [39] compared Numba with CUDA C, revealing the slower performance of Numba by 50-85% for compute-intensive benchmarks.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Exploring Numba and CuPy for GPU-Accelerated Monte Carlo Radiation Transport

Askar,

Yergaliyev,

Shukirgaliyev

et al. 2024

Computation

View full text Add to dashboard Cite

This paper examines the performance of two popular GPU programming platforms, Numba and CuPy, for Monte Carlo radiation transport calculations. We conducted tests involving random number generation and one-dimensional Monte Carlo radiation transport in plane-parallel geometry on three GPU cards: NVIDIA Tesla A100, Tesla V100, and GeForce RTX3080. We compared Numba and CuPy to each other and our CUDA C implementation. The results show that CUDA C, as expected, has the fastest performance and highest energy efficiency, while Numba offers comparable performance when data movement is minimal. While CuPy offers ease of implementation, it performs slower for compute-heavy tasks.

show abstract

Section: Resultssupporting

confidence: 94%

Section: Methodsmentioning

confidence: 99%