Proceedings of the 2017 ACM International Conference on Management of Data 2017
DOI: 10.1145/3035918.3035946
|View full text |Cite
|
Sign up to set email alerts
|

FPGA-based Data Partitioning

Abstract: Implementing parallel operators in multicore machines often involves a data partitioning step that divides the data into cache-size blocks and arranges them so to allow concurrent threads to process them in parallel. Data partitioning is expensive, in some cases up to 90% of the cost of, e.g., a parallel hash join. In this paper we explore the use of an FPGA to accelerate data partitioning. We do so in the context of new hybrid architectures where the FPGA is located as a co-processor residing on a socket and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
41
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 42 publications
(41 citation statements)
references
References 40 publications
0
41
0
Order By: Relevance
“…While many FPGA accelerators discussed in Sect. 6 such as [49,61,125] have demonstrated that FPGAs can achieve high (multi-) kernel throughput, the overall performance is frequenty-limited by the low bandwidth between the FPGA and the host memory (or CPU). Most recent accelerator designs access the host memory data through PCIe Gen3, which provides a few GB/s bandwidth per channel or a few tens of GB/s accumulated bandwidth.…”
Section: Significant Communication Overheadmentioning
confidence: 99%
See 1 more Smart Citation
“…While many FPGA accelerators discussed in Sect. 6 such as [49,61,125] have demonstrated that FPGAs can achieve high (multi-) kernel throughput, the overall performance is frequenty-limited by the low bandwidth between the FPGA and the host memory (or CPU). Most recent accelerator designs access the host memory data through PCIe Gen3, which provides a few GB/s bandwidth per channel or a few tens of GB/s accumulated bandwidth.…”
Section: Significant Communication Overheadmentioning
confidence: 99%
“…In this work, deep pipelining is used to hide the latency of multiple value comparisons. Kaan et al [61] proposed a hash partitioner that can contin-…”
Section: Hash Joinmentioning
confidence: 99%
“…Sidler et al [53] have proposed an FPGA solution for accelerating database pattern matching queries, the proposed solution reduces query response time by 70%. Similarly, Kara et al [27], demonstrated how offloading the partitioning operation of the SQL join operator to the FPGA can significantly improve performance and offer a robust solution.…”
Section: Low-latency Data Processing Pipelinesmentioning
confidence: 99%
“…This is possible because the ACCORDA accelerator is fast, small, and low-power so that a single accelerator is sufficient to support across many CPU cores (see Section 5), and still delivers high speedups (evaluated in Section 7.3). Most other hardware acceleration approaches are forced into looser integration [18,35,45,50] by power, and wind up with two worker types: accelerated and normal. Such an approach complicates scheduling, forcing query execution to switch between workers to exploit acceleration.…”
Section: Uniform Runtime Worker Modelmentioning
confidence: 99%