Parallelization of large vector similarity computations in a hybrid CPU+GPU environment

Czarnul, Paweł

doi:10.1007/s11227-017-2159-7

Cited by 13 publications

(6 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The former can be implemented with, e.g., OpenMP, OpenCL or Pthreads for CPUs and CUDA, OpenCL, OpenACC for GPUs, while the latter can be implemented typically with MPI. Paper [4] presents an exemplary implementation and optimization of parallelization of large vector similarity computations in a hybrid CPU+GPU environment, including load balancing and finding configuration parameters. CUDA-aware MPI implementations allow using CUDA buffers in MPI calls which simplifies implementation.…”

Section: Related Work and Motivationsmentioning

confidence: 99%

See 1 more Smart Citation

Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs

Knap

Czarnul

2019

J Supercomput

Self Cite

View full text Add to dashboard Cite

The paper presents assessment of Unified Memory performance with data prefetching and memory oversubscription. Several versions of code are used with: standard memory management, standard Unified Memory and optimized Unified Memory with programmer-assisted data prefetching. Evaluation of execution times is provided for four applications: Sobel and image rotation filters, stream image processing and computational fluid dynamic simulation, performed on Pascal and Volta architecture GPUs-NVIDIA GTX 1080 and NVIDIA V100 cards. Furthermore, we evaluate the possibility of allocating more memory than available on GPUs and assess performance of codes using the three aforementioned implementations, including memory oversubscription available in CUDA. Results serve as recommendations and hints for other similar codes regarding expected performance on modern and already widely available GPUs.

show abstract

Section: Related Work and Motivationsmentioning

confidence: 99%

“…In the future, we plan to extend this research toward systems with more GPUs as well as incorporation of UM into previous hybrid CPU+GPU implementations [4]. Another direction of research will include testing impact of various host architectures on performance of GPU processing.…”

Section: Summary and Future Workmentioning

confidence: 99%

Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs

Knap

Czarnul

2019

J Supercomput

Self Cite

View full text Add to dashboard Cite

show abstract

“…Making the sequential data mining procedures into parallel processing friendly is the crucial part in using GPUs. The parallelization in amalgamated memory execution has Partitioning, Assignment and Execution modules [22]. The portioning module is responsible for splitting the data into sub-data packets for available GPU cores.…”

Section: B Gpu Based Data Miningmentioning

confidence: 99%

“…The Execution module is used to trig the process initialization on the GPU cores in parallel. The optimization of the GPU kernel of the work "Parallelization of large vector similarity computations in a hybrid CPU+GPU environment" [22] is adopted in the proposed work. The GPU based frequent itemset extraction is given in Figure 1.…”

Section: B Gpu Based Data Miningmentioning

confidence: 99%

Multi-Core Processing Cloud Eclat Growth

Priya*,

Murugan

2019

IJEAT

View full text Add to dashboard Cite

Data mining is a lively process used in many leading technologies of this information era. Eclat growth is one of the best performance data mining algorithms. This work is indented to create a suave interface for Eclat growth algorithm to run in multi-core processor-based cloud computing environments. Recent improvements in processor manufacturing technology make it possible to create multi-core high performance Central Processing Units (CPUs) and Graphics Processing Units (GPUs). Many cloud services are already providing accessibility to these high-power processor virtual machines. The process of blending these technologies with Eclat Growth is proposed here in the name of “Multi-core Processing Cloud Eclat Growth” (MPCEG) to achieve higher processing speeds without compromising the standard data mining metrics such as Accuracy, Precision, Recall and F1-Score. New procedures for Cloud Parallel Processing, GPU Utilization, Annihilation of floating point arithmetic errors by fixed point replacement in GPUs and Hierarchical offloading aggregation are introduced in the construction process of proposed MPCEG

show abstract

“…RELATED WORK CUDA application and system models, numerous examples and typical aforementioned optimizations are discussed in the literature [2], [3], also from the point of view of power/performance efficiency of different optimizations [5]. The particular problem addressed in this work can be applied to any GPU application that processes a sequence of independent input data sets for which communication and computations can be overlapped, for example a sequence of matrix multiplications, block-based matrix multiplication, computing similarities among a large number of multidimensional vectors [6] etc. Furthermore, results from this study can also be incorporated into frameworks that can automatically parallelize computations performed in batches.…”

Section: Introductionmentioning

confidence: 99%

Benchmarking overlapping communication and computations with multiple streams for modern GPUs

Czarnul

2018

Communication Papers of the 2018 Federated Conference on Computer Science and Information Systems

Self Cite

View full text Add to dashboard Cite

show abstract

Parallelization of large vector similarity computations in a hybrid CPU+GPU environment

Cited by 13 publications

References 23 publications

Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs

Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs

Multi-Core Processing Cloud Eclat Growth

Benchmarking overlapping communication and computations with multiple streams for modern GPUs

Contact Info

Product

Resources

About