OpenMPC: Extended OpenMP Programming and Tuning for GPUs

Lee, Seyong; Eigenmann, Rudolf

doi:10.1109/sc.2010.36

Cited by 186 publications

(111 citation statements)

References 13 publications

Supporting

Mentioning

109

Contrasting

Unclassified

Order By: Relevance

“…In contrast, general purpose automatic parallelization compilers for accelerator including GPGPU have been appearing recently, such as OpenACC [20], PGI Accelerator [27], and CAPS HMPP [28]. Moreover, academic suggestions to assist to make up OpenCL and CUDA programs have also been presented [29]- [31]. HiCrypt differs from them in their purpose, which is specialized for symmetric block ciphers.…”

Section: Translation Resultsmentioning

confidence: 99%

HiCrypt: A Specialized Translator for Symmetric Block Cipher and GPGPU

Iwai

Nishikawa

Kurokawa

2013

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYMany-core computer systems with GPUs are coming into mainstream use from high-end computing, including supercomputers, to embedded processors. Consequently, the implementation of cryptographic methods on GPGPU is also becoming popular because of such systems' performance. However, many factors affect the performance of GPUs. To cope with this problem, we developed a new translator, HiCrypt, which can generate an optimized GPGPU program written in both of CUDA and OpenCL from a cipher program written in standard C language with directives. Users must annotate only variables and an encoding/decoding function, which are characteristics of cipher programs, with directives. To evaluate the translator, five representative cipher programs are translated into CUDA and OpenCL programs by the translator. Generated programs perform high throughput almost identical to hand optimized programs for all five cipher programs. HiCrypt will contribute to development and evaluate of new and various symmetric block ciphers using GPGPU.

show abstract

Section: Translation Resultsmentioning

confidence: 99%

HiCrypt: A Specialized Translator for Symmetric Block Cipher and GPGPU

Iwai

Nishikawa

Kurokawa

2013

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

show abstract

“…Other experimental compilation tools like CGCM [16] and PAR4ALL [1] aim at automating the process of CPU-GPU communication and the detection of the pieces of code that can run in parallel. The work by Lee and Eigenmann [20] proposes OpenMPC, an API to facilitate translation of OpenMP programs to CUDA, and a compilation system to support it.…”

Section: Related Workmentioning

confidence: 99%

“…Lee and Vetter evaluate 8 Rodinia benchmarks (out of the 15 we evaluate) and some scientific kernels, such as Jacobi or kernels from the NAS benchmarks. They also evaluate the PGI, CAPS, Open-MPC [20,21], and R-Stream [23] compilers. However, the main difference with this study, is that the work reported here also includes the transformations steps that programmers must follow to transform OpenMP programs into directivebased programs so that these compilers can generate efficient accelerator code.…”

Section: Related Workmentioning

confidence: 99%

Directive-Based Compilers for GPUs

Ghike

Tejero

Garzarán

et al. 2015

Languages and Compilers for Parallel Computing

View full text Add to dashboard Cite

Abstract. General Purpose Graphics Computing Units can be effectively used for enhancing the performance of many contemporary scientific applications. However, programming GPUs using machine-specific notations like CUDA or OpenCL can be complex and time consuming. In addition, the resulting programs are typically fine-tuned for a particular target device. A promising alternative is to program in a conventional and machine-independent notation extended with directives and use compilers to generate GPU code automatically. These compilers enable portability and increase programmer productivity and, if effective, would not impose much penalty on performance. This paper evaluates two such compilers, PGI and Cray. We first identify a collection of standard transformations that these compilers can apply. Then, we propose a sequence of manual transformations that programmers can apply to enable the generation of efficient GPU kernels. Lastly, using the Rodinia Benchmark suite, we compare the performance of the code generated by the PGI and Cray compilers with that of code written in CUDA. Our evaluation shows that the code produced by the PGI and Cray compilers can perform well. For 6 of the 15 benchmarks that we evaluated, the compiler generated code achieved over 85% of the performance of a hand-tuned CUDA version.

show abstract

“…Several previous studies [1,6,[8][9][10][11] have explored directive-based language extensions and compiler techniques to exploit parallelism using NVIDIA GPUs. We briefly mention a few of them in this section.…”

Section: Related Workmentioning

confidence: 99%

“…Lee and Eigenmann [10] presented an approach of directly translating OpenMP CPU code to GPU code without using language extensions. Compiler analysis finds synchronization points in each parallel region, which can then be split into multiple subregions as necessary for generating multiple CUDA kernels.…”

Section: Related Workmentioning

confidence: 99%

Early Experiences with the OpenMP Accelerator Model

Liao

Yan

Supinski

et al. 2013

OpenMP in the Era of Low Power Devices and Accelerators

View full text Add to dashboard Cite

Abstract. A recent trend in mainstream computer nodes is the combined use of general-purpose multicore processors and specialized accelerators such as GPUs and DSPs in order to achieve better performance and to reduce power consumption. To support this trend, the OpenMP Language Committee has approved a set of extensions to OpenMP (referred to as the OpenMP accelerator model). The initial version is the subject of Technical Report 1 (TR1) while OpenMP 4.0 Release Candidate 2 (RC2) further refines the extensions. In this paper, we examine the newly released accelerator directives and create an initial reference implementation, referred to as HOMP (Heterogeneous OpenMP). Focused on targeting NVIDIA GPUs, our work is based on an existing OpenMP implementation in the ROSE sourceto-source compiler infrastructure. HOMP includes extensions to parse the new constructs and to represent them in the AST and other compiler translation details. Further we provide initial runtime support. For our evaluation, we have adapted a few existing OpenMP codes to use the accelerator model directives and present preliminary performance results. Finally, we critique the accelerator model in terms of its impact on developers and compiler writers and suggest possible improvements.

show abstract

OpenMPC: Extended OpenMP Programming and Tuning for GPUs

Cited by 186 publications

References 13 publications

HiCrypt: A Specialized Translator for Symmetric Block Cipher and GPGPU

HiCrypt: A Specialized Translator for Symmetric Block Cipher and GPGPU

Directive-Based Compilers for GPUs

Early Experiences with the OpenMP Accelerator Model

Contact Info

Product

Resources

About