Exploiting graphical processing units for data‐parallel scientific applications

Leist, A.; Playne, Daniel P.; Hawick, K. A.

doi:10.1002/cpe.1462

Cited by 37 publications

(33 citation statements)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our experiments have showed that this technique can slightly improve the kernel execution time when processing scale-free graphs with their "fat tail" degree distributions, but the overall execution time, which includes the time required to sort the vertices by the CPU, increases considerably. These findings are in line with previous results reported in [29]. Consequently, the implementation of Kernel 2 reported here does not sort the vertices.…”

Section: Vertex-based Kernelsupporting

confidence: 93%

“…We can also make use of specialised memory types available on the GPU to improve memory access. In our previous work [29] it was shown that the optimal memory type to use for simulating field equations is the texture memory type. The texture memory type uses an on-chip memory cache that caches values from global memory in the spatial locality.…”

Section: Cuda Implementationsmentioning

confidence: 99%

“…The Cahn-Hilliard equation is a phasetransition model of a quenching binary alloy, component labelling can be used to determine which phase the field is currently in. The Cahn-Hilliard simulation used is the GPU-based simulation described in [29]. By testing the labelling algorithms in conjunction with a real GPU simulation, it will provide performance data of the algorithms as they would be used in a real situation.…”

Section: Field Equation Datamentioning

confidence: 99%

See 2 more Smart Citations

Parallel graph component labelling with GPUs and CUDA

2010

Self Cite

View full text Add to dashboard Cite

Graph component labelling, which is a subset of the general graph colouring problem, is a computationally expensive operation that is of importance in many applications and simulations. A number of data-parallel algorithmic variations to the component labelling problem are possible and we explore their use with general purpose graphical processing units (GPGPUs) and with the CUDA GPU programming language. We discuss implementation issues and performance results on GPUs using CUDA. We present results for regular mesh graphs as well as arbitrary structured and topical graphs such as small-world and scale-free structures. We show how different algorithmic variations can be used to best effect depending upon the cluster structure of the graph being labelled and consider how features of the GPU architectures and host CPUs can be combined to best effect into a cluster component labelling algorithm for use in high performance simulations.

show abstract

Section: Vertex-based Kernelsupporting

confidence: 93%

Section: Cuda Implementationsmentioning

confidence: 99%

Section: Field Equation Datamentioning

confidence: 99%

See 1 more Smart Citation

Parallel graph component labelling with GPUs and CUDA

2010

Self Cite

View full text Add to dashboard Cite

show abstract

“…Many later studies [17] use that data structure as foundational graph representation. A. Leist et al [18] propose another kind of graph representation which makes some improvement on compact adjacency list. As Fig.…”

Section: B Compact Adjacency Listmentioning

confidence: 99%

Design and Implementation of GPU-Based Prim's Algorithm

Wang¹,

Huang²,

Guo³

2011

IJMECS

View full text Add to dashboard Cite

Abstract-Minimum spanning tree is a classical problem in graph theory that plays a key role in a broad domain of applications. This paper proposes a minimum spanning tree algorithm using Prim's approach on Nvidia GPU under CUDA architecture. By using new developed GPU-based Min-Reduction data parallel primitive in the key step of the algorithm, higher efficiency is achieved. Experimental results show that we obtain about 2 times speedup on Nvidia GTX260 GPU over the CPU implementation and 3 times speedup over non-primitives GPU implementation.

show abstract

“…It is possible to bring parallelism to bear on many problems using hybrids of cluster-computing approaches; accelerator technologies such as general purpose graphical processing unit (GP-GPU); and the use of many threads within a conventional multi-core CPU. These are typified by software technologies such as the open standard Message Passing Interface (MPI) [12], [13]; NVIDIA's Compute Unified Device Architecture (CUDA) [14]- [16] for GPUs; and Intel's Thread Building Blocks (TBB) [17], [18] software for multi-threaded programming multi-core devices, respectively. It is however tedious, error prone and non- trivial for a programmer or even a programming team to implement an application that works well across all these three parallel paradigms or platforms -even for a problems like finite-difference equation solving that have relatively well known solutions.…”

Section: Introductionmentioning

confidence: 99%

Auto-generation of Parallel Finite-Differencing Code for MPI, TBB and CUDA

Playne

Hawick

2011

2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PHD Forum

View full text Add to dashboard Cite

Abstract-Finite-difference methods can be useful for solving certain partial differential equations (PDEs) in the time domain. Compiler technologies can be used to parse an application domain specific representation of these PDEs and build an abstract representation of both the equation and the desired solver. This abstract representation can be used to generate a language-specific implementation. We show how this framework can be used to generate software for several parallel platforms: Message Passing Interface (MPI), Threading Building Blocks(TBB) and Compute Unified Device Architecture(CUDA). We present performance data of the automatically-generated parallel code and discuss the implications of the generator in terms of code portability, development time and maintainability.

show abstract

Exploiting graphical processing units for data‐parallel scientific applications

Cited by 37 publications

References 46 publications

Parallel graph component labelling with GPUs and CUDA

Parallel graph component labelling with GPUs and CUDA

Design and Implementation of GPU-Based Prim's Algorithm

Auto-generation of Parallel Finite-Differencing Code for MPI, TBB and CUDA

Contact Info

Product

Resources

About