Xiaonan Tian scite author profile

Existing studies show that using single GPU can lead to obtaining significant performance gains. We should be able to achieve further performance speedup if we use more than one GPU. Heterogeneous processors consisting of multiple CPUs and GPUs offer immense potential and are often considered as a leading candidate for porting complex scientific applications. Unfortunately programming heterogeneous systems requires more effort than what is required for traditional multicore systems. Directive-based programming approaches are being widely adopted since they make it easy to use/port/maintain application code. OpenMP and OpenACC are two popular models used to port applications to accelerators. However, neither of the models provides support for multiple GPUs. A plausible solution is to use combination of OpenMP and OpenACC that forms a hybrid model; however, building this model has its own limitations due to lack of necessary compilers’ support. Moreover, the model also lacks support for direct device-to-device communication. To overcome these limitations, an alternate strategy is to extend OpenACC by proposing and developing extensions that follow a task-based implementation for supporting multiple GPUs. We critically analyze the applicability of the hybrid model approach and evaluate the proposed strategy using several case studies and demonstrate their effectiveness.

show abstract

Compiler transformation of nested loops for general purpose GPUs

Tian

Chandrasekaran

et al. 2015

Concurrency and Computation

View full text Add to dashboard Cite

SUMMARYManycore accelerators have the potential to significantly improve performance of scientific applications when offloading computationally intensive program portions to accelerators. Directive-based high-level programming models, such as OpenACC and OpenMP, are used to create applications for accelerators through annotating regions of code meant for offloading. OpenACC is an emerging directive-based programming model for programming accelerators that typically enable inexperienced programmers to achieve portable and productive performance within applications. In this paper, we present our research in developing challenges and solutions when creating an open-source OpenACC compiler in an industrial framework (OpenUH as a branch of Open64). We then discuss in detail techniques we developed for loop scheduling reduction operations on general purpose GPUs. The compiler is evaluated with benchmarks from the NAS Parallel Benchmarks suite and self-written micro-benchmarks for reduction operations. This implementation has been designed to serve as a compiler infrastructure for researchers to explore advanced compiler techniques, extend OpenACC to other programming models, and build performance tools used in conjunction with OpenACC programs.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xiaonan Tian

In situ formation of Ni₃Se₄ nanorod arrays as versatile electrocatalysts for electrochemical oxidation reactions in hybrid water electrolysis

NAS Parallel Benchmarks for GPGPUs Using a Directive-Based Programming Model

Compiling a High-Level Directive-Based Programming Model for GPGPUs

Multi-GPU Support on Single Node Using Directive-Based Programming Model

Compiler transformation of nested loops for general purpose GPUs

Contact Info

Product

Resources

About

Xiaonan Tian

In situ formation of Ni3Se4 nanorod arrays as versatile electrocatalysts for electrochemical oxidation reactions in hybrid water electrolysis

NAS Parallel Benchmarks for GPGPUs Using a Directive-Based Programming Model

Compiling a High-Level Directive-Based Programming Model for GPGPUs

Multi-GPU Support on Single Node Using Directive-Based Programming Model

Compiler transformation of nested loops for general purpose GPUs

Contact Info

Product

Resources

About

In situ formation of Ni₃Se₄ nanorod arrays as versatile electrocatalysts for electrochemical oxidation reactions in hybrid water electrolysis