BigKernel -- High Performance CPU-GPU Communication Pipelining for Big Data-Style Applications

Mokhtari, Reza; Stumm, Michael

doi:10.1109/ipdps.2014.89

Cited by 22 publications

(12 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, the high bandwidth of GPU memory can only be exploited when GPU threads executing at the same time access memory in a coalesced fashion, where the threads simultaneously access adjacent memory locations. For efficient streaming data filtering, we applied "BigKernel" [17] that is a data communication scheme between CPU and GPU to address the above issues. BigKernel can use a four-stage pipeline with an automated prefetching method to (i) optimize CPU-GPU communication and (ii) optimize GPU memory accesses.…”

Section: A Communication Scheme For Processing Geo-textual Streaming mentioning

confidence: 99%

“…We further employed a data streaming communication method [17] to optimize I/O overheads between GPU and CPU during continuously processing geo-textual streaming data.…”

mentioning

confidence: 99%

“…Additionally, this study only handled the range query amongst a variety of continuous queries since our study is based on AP-tree and the spatial query related to AP-tree is range query. This paper has been expanded from its previous conference version [18] to include one description about a parallel scheme of approximate keyword matching, a data streaming communication method between CPU and GPU in [17] that is used to optimize the performance of GPU-aided AP-tree + , an additional data set for experiments, and more experiments for evaluating proposed indexing methods. The remainder of this paper is organized as follows: Section 2 discusses work relating to indexing methods for geo-textual data.…”

mentioning

confidence: 99%

See 2 more Smart Citations

An Efficient Indexing Approach for Continuous Spatial Approximate Keyword Queries over Geo-Textual Streaming Data

Deng

Wang

et al. 2019

IJGI

View full text Add to dashboard Cite

Current social-network-based and location-based-service applications need to handle continuous spatial approximate keyword queries over geo-textual streaming data of high density. The continuous query is a well-known expensive operation. The optimization of continuous query processing is still an open issue. For geo-textual streaming data, the performance issue is more serious since both location information and textual description need to be matched for each incoming streaming data tuple. The state-of-the-art continuous spatial-keyword query indexing approaches generally lack both support for approximate keyword matching and high-performance processing for geo-textual streaming data. Aiming to tackle this problem, this paper first proposes an indexing approach for efficient supporting of continuous spatial approximate keyword queries by integrating m i n - w i s e signatures into an AP-tree, namely AP-tree + . AP-tree + utilizes the one-permutation m i n - w i s e hashing method to achieve a much lower signature maintenance costs compared with the traditional m i n - w i s e hashing method because it only employs one hashing function instead of dozens. Towards providing a more efficient indexing approach, this paper has explored the feasibility of parallelizing AP-tree + by employing a Graphic Processing Unit (GPU). We mapped the AP-tree + data structure into the GPU’s memory with a variety of one-dimensional arrays to form the GPU-aided AP-tree + . Furthermore, a m i n - w i s e parallel hashing algorithm with a scheme of data parallel and a GPU-CPU data communication method based on a four-stage pipeline way have been used to optimize the performance of the GPU-aided AP-tree + . The experimental results indicate that (1) AP-tree + can reduce the space cost by about 11% compared with MHR-tree, (2) AP-tree + can hold a comparable recall and 5.64× query performance gain compared with MHR-tree while saving 41.66% maintenance cost on average, (3) the GPU-aided AP-tree + can attain an average speedup of 5.76× compared to AP-tree + , and (4) the GPU-CPU data communication scheme can further improve the query performance of the GPU-aided AP-tree + by 39.4%.

show abstract

Section: A Communication Scheme For Processing Geo-textual Streaming mentioning

confidence: 99%

“…We further employed a data streaming communication method [17] to optimize I/O overheads between GPU and CPU during continuously processing geo-textual streaming data.…”

mentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

An Efficient Indexing Approach for Continuous Spatial Approximate Keyword Queries over Geo-Textual Streaming Data

Deng

Wang

et al. 2019

IJGI

View full text Add to dashboard Cite

show abstract

“…Hence, its speedup can be a hundredfold or zero. Several ways to predict performance of OpenCL kernels for different devices have been mentioned in three extensive surveys (Mokhtari and Stumm, 2014;Rossbach et al, 2013;Yan et al, 2009). Kernel profiling is a key technique used to get information about kernels to be classified, e.g.…”

Section: Introductionmentioning

confidence: 99%

Classification of OpenCL Kernels for accelerating Java Multi-agent Simulation

Penbharkkul¹,

Marurngsith

2018

Linköping Electronic Conference Proceedings

View full text Add to dashboard Cite

Java-based multi-agent simulation (MAS) can be offloaded to graphical processing units (GPU) and other OpenCL accelerators to achieve many hundredfold speedups. However, the performance gain from the accelerated code depends strongly on whether the computation (kernels) have been scheduled to the appropriate devices. Thus, accelerating Java MAS may not lead to a sustainable speedup. This paper proposes a method for a kernel classifier to specify suitable devices to execute OpenCL kernels. The classifier can identify suitable OpenCL devices for kernels based on the static and dynamic characteristics of the code of the kernels. Kernels are grouped by their suitability for particular devices using the multiclass support virtual machine technique. After that, kernels are scheduled to an appropriate task queue. Kernel scheduling based on the proposed technique is compared against the firstcome-first-serve (FCFS) technique and against oracle scheduling when handling eight kernels. Our results show that, using the proposed method, all kernels finished execution 45 percent sooner than using the FCFS technique. However, the overall execution time was 22.5 percent longer than with oracle scheduling. Our results seem to confirm that kernel classification techniques might contribute towards sustainable high performance in accelerated Java-based MAS models.

show abstract

“…A CPU code for K-means clustering.2.4.3.3 Mokhtari et alBigKernel[92], a compiler and runtime technique to address several challenges associated with data processing involving GPGPU. The BigKernel also addresses the problem of uncoalesced memory accesses occurring in Big Data-style computations.…”

mentioning

confidence: 99%

Efficient similarity computations on parallel machines using data shaping

Shukla

View full text Add to dashboard Cite

show abstract

BigKernel -- High Performance CPU-GPU Communication Pipelining for Big Data-Style Applications

Cited by 22 publications

References 20 publications

An Efficient Indexing Approach for Continuous Spatial Approximate Keyword Queries over Geo-Textual Streaming Data

An Efficient Indexing Approach for Continuous Spatial Approximate Keyword Queries over Geo-Textual Streaming Data

Classification of OpenCL Kernels for accelerating Java Multi-agent Simulation

Efficient similarity computations on parallel machines using data shaping

Contact Info

Product

Resources

About