FPGA-based Data Partitioning

Kara, Kaan; Giceva, Jana; Alonso, Gustavo

doi:10.1145/3035918.3035946

Cited by 42 publications

(41 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While many FPGA accelerators discussed in Sect. 6 such as [49,61,125] have demonstrated that FPGAs can achieve high (multi-) kernel throughput, the overall performance is frequenty-limited by the low bandwidth between the FPGA and the host memory (or CPU). Most recent accelerator designs access the host memory data through PCIe Gen3, which provides a few GB/s bandwidth per channel or a few tens of GB/s accumulated bandwidth.…”

Section: Significant Communication Overheadmentioning

confidence: 99%

See 1 more Smart Citation

In-memory database acceleration on FPGAs: a survey

et al. 2019

View full text Add to dashboard Cite

While FPGAs have seen prior use in database systems, in recent years interest in using FPGA to accelerate databases has declined in both industry and academia for the following three reasons. First, specifically for in-memory databases, FPGAs integrated with conventional I/O provide insufficient bandwidth, limiting performance. Second, GPUs, which can also provide high throughput, and are easier to program, have emerged as a strong accelerator alternative. Third, programming FPGAs required developers to have full-stack skills, from high-level algorithm design to low-level circuit implementations. The good news is that these challenges are being addressed. New interface technologies connect FPGAs into the system at mainmemory bandwidth and the latest FPGAs provide local memory competitive in capacity and bandwidth with GPUs. Ease of programming is improving through support of shared coherent virtual memory between the host and the accelerator, support for higher-level languages, and domain-specific tools to generate FPGA designs automatically. Therefore, this paper surveys using FPGAs to accelerate in-memory database systems targeting designs that can operate at the speed of main memory.

show abstract

Section: Significant Communication Overheadmentioning

confidence: 99%

“…In this work, deep pipelining is used to hide the latency of multiple value comparisons. Kaan et al [61] proposed a hash partitioner that can contin-…”

Section: Hash Joinmentioning

confidence: 99%

In-memory database acceleration on FPGAs: a survey

et al. 2019

View full text Add to dashboard Cite

show abstract

“…Sidler et al [53] have proposed an FPGA solution for accelerating database pattern matching queries, the proposed solution reduces query response time by 70%. Similarly, Kara et al [27], demonstrated how offloading the partitioning operation of the SQL join operator to the FPGA can significantly improve performance and offer a robust solution.…”

Section: Low-latency Data Processing Pipelinesmentioning

confidence: 99%

Lowering the latency of data processing pipelines through FPGA based hardware acceleration

et al. 2019

Self Cite

View full text Add to dashboard Cite

Web search engines often involve a complex pipeline of processing stages including computing, scoring, and ranking potential answers plus returning the sorted results. The latency of such pipelines can be improved by minimizing data movement, making stages faster, and merging stages. The throughput is determined by the stage with the smallest capacity and it can be improved by allocating enough parallel resources to each stage. In this paper we explore the possibility of employing hardware acceleration (an FPGA) as a way to improve the overall performance when computing answers to search queries. With a real use case as a baseline and motivation, we focus on accelerating the scoring function implemented as a decision tree ensemble, a common approach to scoring and classification in search systems. Our solution uses a novel decision tree ensemble implementation on an FPGA to: 1) increase the number of entries that can be scored per unit of time, and 2) provide a compact implementation that can be combined with previous stages. The resulting system, tested in Amazon F1 instances, significantly improves the quality of the search results and improves performance by two orders of magnitude over the existing CPU based solution.

show abstract

“…This is possible because the ACCORDA accelerator is fast, small, and low-power so that a single accelerator is sufficient to support across many CPU cores (see Section 5), and still delivers high speedups (evaluated in Section 7.3). Most other hardware acceleration approaches are forced into looser integration [18,35,45,50] by power, and wind up with two worker types: accelerated and normal. Such an approach complicates scheduling, forcing query execution to switch between workers to exploit acceleration.…”

Section: Uniform Runtime Worker Modelmentioning

confidence: 99%

Accelerating raw data analysis with the ACCORDA software and hardware architecture

2019

View full text Add to dashboard Cite

The data science revolution and growing popularity of data lakes make efficient processing of raw data increasingly important. To address this, we propose the ACCelerated Operators for Raw Data Analysis (ACCORDA) architecture. By extending the operator interface (subtype with encoding) and employing a uniform runtime worker model, ACCORDA integrates data transformation acceleration seamlessly, enabling a new class of encoding optimizations and robust high-performance raw data processing. Together, these key features preserve the software system architecture, empowering state-of-art heuristic optimizations to drive flexible data encoding for performance. ACCORDA derives performance from its software architecture, but depends critically on the acceleration of the Unstructured Data Processor (UDP) that is integrated into the memory-hierarchy, and accelerates data transformation tasks by 16x-21x (parsing, decompression) to as much as 160x (deserialization) compared to an x86 core. We evaluate ACCORDA using TPC-H queries on tabular data formats, exercising raw data properties such as parsing and data conversion. The ACCORDA system achieves 2.9x-13.2x speedups when compared to SparkSQL, reducing raw data processing overhead to a geomean of 1.2x (20%). In doing so, ACCORDA robustly matches or outperforms prior systems that depend on caching loaded data, while computing on raw, unloaded data. This performance benefit is robust across format complexity, query predicates, and selectivity (data statistics). ACCORDA's encoding-extended operator interface unlocks aggressive encoding-oriented optimizations that deliver 80% average performance increase over the 7 affected TPC-H queries.

show abstract

FPGA-based Data Partitioning

Cited by 42 publications

References 40 publications

In-memory database acceleration on FPGAs: a survey

In-memory database acceleration on FPGAs: a survey

Lowering the latency of data processing pipelines through FPGA based hardware acceleration

Accelerating raw data analysis with the ACCORDA software and hardware architecture

Contact Info

Product

Resources

About