Using Fermi Architecture Knowledge to Speed up CUDA and OpenCL Programs

Torres, Yuri; González-Escribano, Arturo; Llanos, Diego R.

doi:10.1109/ispa.2012.92

Cited by 10 publications

(9 citation statements)

References 11 publications

(12 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…OpenCL programs are compiled just in time for execution and can be used together with Mi-AccLib or other run-time libraries. These works [16][17][18] experienced a performance penalty on the NVIDIA GPU, due to the OpenCL abstraction layer. Thus, we have disabled OpenCL support as it is not optimized for GPUs at the moment, and real gains on GPUs can only be seen through optimized code as there are additional overheads from data movement.…”

Section: Related Workmentioning

confidence: 99%

A Middleware Framework for Programmable Multi-GPU-Based Big Data Applications

Karuppiah

Kok

Singh

2014

GPU Computing and Applications

View full text Add to dashboard Cite

Current application of GPU processors for parallel computing tasks shows excellent results in terms of speedups compared to CPU processors. However, there is no existing middleware framework that enables automatic distribution of data and processing across heterogeneous computing resources for structured and unstructured Big Data applications. Thus, we propose a middleware framework for "Big Data" analytics that provides mechanisms for automatic data segmentation, distribution, execution, information retrieval across multiple cards (CPU and GPU) and machines, a modular design for easy addition of new GPU kernels at both analytic and processing layer, and information presentation. The architecture and components of the framework such as multi-card data distribution and execution, data structures for efficient memory access, algorithms for parallel GPU computation, and results for various test configurations are shown. Our results show proposed middleware framework, providing alternative and cheaper HPC solution to users. Data cleansing algorithms on GPU show a speedup of over two orders of magnitude compared to the same operation done in MySQL on a multi-core machine. Our framework is also capable of processing more than 120 million of health data within 11 s. IntroductionNVIDIA CUDA-enabled GPGPU (general purpose graphic processing unit) has made its name by being part of world super computers to enable high-performance computation. Thus, GPGPUs are widely accepted and becoming common for many high-performance computing applications. GPGPUs are used for both specific and general purpose applications either running in large-scale system or desktop PCs.

show abstract

Section: Related Workmentioning

confidence: 99%

A Middleware Framework for Programmable Multi-GPU-Based Big Data Applications

Karuppiah

Kok

Singh

2014

GPU Computing and Applications

View full text Add to dashboard Cite

show abstract

“…Programming an application on the GPU is a non-trivial task, and many key factors must be exploited in order for the application to achieve optimal performance [20]. One of the factors is to have every work-item within a warp execute the same set of instructions that follow the same execution path.…”

Section: Gpu Architecturementioning

confidence: 99%

Mu-GSIM: A mutation testing simulator on GPUs

Tong

Boulé

Žilić

2013

Fifth Asia Symposium on Quality Electronic Design (ASQED 2013)

View full text Add to dashboard Cite

Graphics Processing Units (GPUs) have recently gained widespread usage as an advanced parallel platform for accelerating compute intensive applications. The maturity of programming interfaces and the improved programmability of GPUs have enabled the development of parallel algorithms that leverage the wealth of compute power provided by them. In this paper, we present µ-GSIM, a GPU-based simulation tool that leverages the inherent bit parallelism of GPUs for accelerating simulations of mutated digital circuits. We propose an efficient mapping of multiple mutated circuits on the GPU's device memory, where we exploit as much data parallelism as possible so our GPU simulation kernel can achieve maximal performance by operating on independent data. Results show that with the largest ITC'99 circuit benchmarks we were able to achieve a 60% decrease in memory usage while gaining a 5.4× increase in simulation performance. Additionally, we demonstrated a speedup of at least 95× against a commercial event-driven simulation tool running on a conventional processor. This is beneficial in the quest for improving test quality.

show abstract

“…Caches a®ect application performance in a signi¯-cant manner, as con¯rmed by several researchers. [2][3][4][5][6][7][8][9][10][11] This makes management of GPU caches extremely important. While CPU cache management has been studied over years, GPU cache management is a relatively new research¯eld.…”

Section: Introductionmentioning

confidence: 99%

A SURVEY OF TECHNIQUES FOR MANAGING AND LEVERAGING CACHES IN GPUs

Mittal

2014

J CIRCUIT SYST COMP

View full text Add to dashboard Cite

Initially introduced as special-purpose accelerators for graphics applications, graphics processing units (GPUs) have now emerged as general purpose computing platforms for a wide range of applications. To address the requirements of these applications, modern GPUs include sizable hardware-managed caches. However, several factors, such as unique architecture of GPU, rise of CPU-GPU heterogeneous computing, etc., demand e®ective management of caches to achieve high performance and energy e±ciency. Recently, several techniques have been proposed for this purpose. In this paper, we survey several architectural and system-level techniques proposed for managing and leveraging GPU caches. We also discuss the importance and challenges of cache management in GPUs. The aim of this paper is to provide the readers insights into cache management techniques for GPUs and motivate them to propose even better techniques for leveraging the full potential of caches in the GPUs of tomorrow.

show abstract

Using Fermi Architecture Knowledge to Speed up CUDA and OpenCL Programs

Cited by 10 publications

References 11 publications

A Middleware Framework for Programmable Multi-GPU-Based Big Data Applications

A Middleware Framework for Programmable Multi-GPU-Based Big Data Applications

Mu-GSIM: A mutation testing simulator on GPUs

A SURVEY OF TECHNIQUES FOR MANAGING AND LEVERAGING CACHES IN GPUs

Contact Info

Product

Resources

About