Optimizing an APSP implementation for NVIDIA GPUs using kernel characterization criteria

Ortega–Arranz, Hector; Torres, Yuri; González-Escribano, Arturo; Llanos, Diego R.

doi:10.1007/s11227-014-1212-z

Cited by 8 publications

(14 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In the current prototype, the CPU threads granularity is determined by a simple regular blocking policy, that does not require a specific kernel characterization. For GPU kernels, the library integrates the model presented in [15,21]. This model allows the determination of configuration parameters (grid, threadblock, and L1 cache memory sizes), for NVIDIA's GPUs.…”

Section: Controllers Librarymentioning

confidence: 99%

Multi-device Controllers: A Library to Simplify Parallel Heterogeneous Programming

Moreton-Fernandez

González-Escribano

Llanos

2017

Int J Parallel Prog

Self Cite

View full text Add to dashboard Cite

Current HPC clusters are composed by several machines with different computation capabilities and different kinds and families of accelerators. Programming efficiently for these heterogeneous systems has become an important challenge. There are many proposals to simplify the programming and management of accelerator devices, and the hybrid programming, mixing accelerators and CPU cores. However, the portability compromises in many cases the efficiency on different devices, and there are details about the coordination of different types of devices that should be still tackled by the programmer.In this work we introduce the Multi-Controler (MCtrl), an abstract entity implemented in a library, that coordinates the management of heterogeneous devices, including accelerators with different capabilities and sets of CPUcores. Our proposal improves state-of-the-art solutions, simplifying the data partition, mapping, and transparent deployment of both, simple generic kernels portable across different device types, and specialized implementations defined and optimized using specific native or vendor programming models (such as CUDA for NVIDIA's GPUs, or OpenMP for CPU-cores). The runtime system automatically selects and deploys the most appropriate implementation of each kernel for each device, managing the data movements, and hiding the launching details. Results of an experimental study with five study cases indicates that our abstraction allows the development of flexible and high efficient programs, that adapt to the heterogeneous environment.

show abstract

Section: Controllers Librarymentioning

confidence: 99%

Multi-device Controllers: A Library to Simplify Parallel Heterogeneous Programming

Moreton-Fernandez

González-Escribano

Llanos

2017

Int J Parallel Prog

Self Cite

View full text Add to dashboard Cite

show abstract

“…Our implementation improves the performance of the previous state-of-the-art due to Martín et al [23]. Following the guidelines proposed in [28], we have proposed a refined method to systematically obtain good GPU configuration parameters in terms of GPU code characteristics. The application of this methodology has led to performance improvements of our shortest paths program solutions compared with the use of configuration parameter values suggested by CUDA programming guidelines [21].…”

Section: Answer To the Research Questionmentioning

confidence: 94%

“…Esta implementación mejora el rendimiento de la anterior solución del estado-del-arte propuesta por Martín et al [23]. Siguiendo las pautas propuestas en [28], hemos refinado el método de caracterización de kernels para obtener valores más adecuados para los parámetros de ejecución de las GPUs que conlleven a ejecuciones óptimas o cercanas al óptimo. La aplicación de esta metodología a nuestra implementación ha hecho que pudiéramos obtener valores más apropiados, que implicaron mejoras muy significativas en comparación con los valores recomendados por las guías de programación de CUDA [21].…”

Section: R4 Conclusionesunclassified

Parallel approaches to shortest-path problems for multilevel heterogeneous computing

Arranz¹

View full text Add to dashboard Cite

Modelo paralelo abstracto, APSP, tuning automático de kernels, Boost Graph Library, configuración de la Cache L1, ejecución concurrente de kernels, CUDA, Dijkstra, GPGPU, sistemas heterogéneos, framework HPC, modelo de caracterización de kernels, proceso de caracterización de kernels, balanceo de carga, MPI, comparativa de plataformas de NVIDIA, OpenMP, técnicas de optimización, algoritmos paralelos, SSSP, geometría del bloque de hilos.

show abstract

“…Our model considers the characterization of the kernel code to automatically optimize launching parameters, such as the thread-block geometry. We propose to integrate the model of qualitative characteristics presented in [105,131] in our Controller. To use this model, the programmers should examine the kernel code, and they should conceptually characterize it in classes according to three main criteria.…”

Section: Characterisation Of Kernels For Executionmentioning

confidence: 99%

“…For GPU kernels, our current prototype library integrates the model presented in [104,105,131]. This model allows to determine configuration parameters (grid, thread-block and L1 cache memory sizes), for NVIDIA's GPUs.…”

Section: Kernel Characterizationmentioning

confidence: 99%

Easing parallel programming on heterogeneous systems

Fernández¹

View full text Add to dashboard Cite

The use of parallel computing systems frequently represents the only scalable way to solve HPC (High performance Computing) problems in reasonable execution times. The current trend in high performance computing platforms is to include in the same machine several parallel devices, of different type and architectures, and to interconnect them to form highly parallel and heterogeneous distributed systems. Programming efficient and portable parallel applications that can really exploit these systems, imposes specific and complex challenges to the programmers. A programmer must be proficient in distributed-memory communication tools or layers, shared-memory programming models, and specific programming models for the available co-processors, in order to create hybrid programs that will exploit all the machine capabilities. Moreover, she also has to deal with the proper workload distribution among the different nodes and devices, assigning to each one an amount of workload related to their computation power and features. Nowadays, all these issues should be solved by the programmer, making the programming of heterogeneous platforms an actual challenge. This PhD. Thesis addresses several main problems related to the parallel programming for highly heterogeneous and distributed systems. It first tackles problems to allow the developing of efficient coordination codes, portable across different kind of devices, accelerators, and architectures. Then, it also targets problems related to the data communication and partition issues concerning the use of devices in distributed-memory systems. In this dissertation we introduce abstractions, mechanisms, and methods to solve many of these problems. We also discuss their practical application to develop research prototypes and actual programming tools. Experimental works conducted using these tools validates the applicability of the proposed techniques and the portability, efficiency, and versatility of the programs that can be obtained.

show abstract

Optimizing an APSP implementation for NVIDIA GPUs using kernel characterization criteria

Cited by 8 publications

References 10 publications

Multi-device Controllers: A Library to Simplify Parallel Heterogeneous Programming

Multi-device Controllers: A Library to Simplify Parallel Heterogeneous Programming

Parallel approaches to shortest-path problems for multilevel heterogeneous computing

Easing parallel programming on heterogeneous systems

Contact Info

Product

Resources

About