2017
DOI: 10.1007/s11227-017-2159-7
|View full text |Cite
|
Sign up to set email alerts
|

Parallelization of large vector similarity computations in a hybrid CPU+GPU environment

Abstract: The paper presents design, implementation and tuning of a hybrid parallel OpenMP+CUDA code for computation of similarity between pairs of a large number of multidimensional vectors. The problem has a wide range of applications, and consequently its optimization is of high importance, especially on currently widespread hybrid CPU+GPU systems targeted in the paper. The following are presented and tested for computation of all vector pairs: tuning of a GPU kernel with consideration of memory coalescing and using … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 13 publications
(6 citation statements)
references
References 23 publications
0
6
0
Order By: Relevance
“…The former can be implemented with, e.g., OpenMP, OpenCL or Pthreads for CPUs and CUDA, OpenCL, OpenACC for GPUs, while the latter can be implemented typically with MPI. Paper [4] presents an exemplary implementation and optimization of parallelization of large vector similarity computations in a hybrid CPU+GPU environment, including load balancing and finding configuration parameters. CUDA-aware MPI implementations allow using CUDA buffers in MPI calls which simplifies implementation.…”
Section: Related Work and Motivationsmentioning
confidence: 99%
See 1 more Smart Citation
“…The former can be implemented with, e.g., OpenMP, OpenCL or Pthreads for CPUs and CUDA, OpenCL, OpenACC for GPUs, while the latter can be implemented typically with MPI. Paper [4] presents an exemplary implementation and optimization of parallelization of large vector similarity computations in a hybrid CPU+GPU environment, including load balancing and finding configuration parameters. CUDA-aware MPI implementations allow using CUDA buffers in MPI calls which simplifies implementation.…”
Section: Related Work and Motivationsmentioning
confidence: 99%
“…In the future, we plan to extend this research toward systems with more GPUs as well as incorporation of UM into previous hybrid CPU+GPU implementations [4]. Another direction of research will include testing impact of various host architectures on performance of GPU processing.…”
Section: Summary and Future Workmentioning
confidence: 99%
“…Making the sequential data mining procedures into parallel processing friendly is the crucial part in using GPUs. The parallelization in amalgamated memory execution has Partitioning, Assignment and Execution modules [22]. The portioning module is responsible for splitting the data into sub-data packets for available GPU cores.…”
Section: B Gpu Based Data Miningmentioning
confidence: 99%
“…The Execution module is used to trig the process initialization on the GPU cores in parallel. The optimization of the GPU kernel of the work "Parallelization of large vector similarity computations in a hybrid CPU+GPU environment" [22] is adopted in the proposed work. The GPU based frequent itemset extraction is given in Figure 1.…”
Section: B Gpu Based Data Miningmentioning
confidence: 99%
“…RELATED WORK CUDA application and system models, numerous examples and typical aforementioned optimizations are discussed in the literature [2], [3], also from the point of view of power/performance efficiency of different optimizations [5]. The particular problem addressed in this work can be applied to any GPU application that processes a sequence of independent input data sets for which communication and computations can be overlapped, for example a sequence of matrix multiplications, block-based matrix multiplication, computing similarities among a large number of multidimensional vectors [6] etc. Furthermore, results from this study can also be incorporated into frameworks that can automatically parallelize computations performed in batches.…”
Section: Introductionmentioning
confidence: 99%