High-Throughput Lossless Compression on Tightly Coupled CPU-FPGA Platforms

Qiao, Weikang; Du, Jieqiong; Fang, Zhenman; Wang, Libo; Lö, Michael; Chang, M.F.; Cong, Jason

doi:10.1145/3174243.3174987

Cited by 20 publications

(22 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The data transformation wrapped as transactions to be processed between master memory on the host and global memory beside the FPGA kernel typically occurs across a peripheral high-speed serial interface. The PCIe (peripheral component interconnect express) -based heterogeneous runtime interactions require data delivered efficiently by using the direct memory access (DMA) to reduce the occupancy in the resources of the CPU [39]. The interactions with the accelerator managed by the data flow framework are restricted by bandwidth limitation contributing to non-full-speed acceleration, even though a high throughput processing kernel is deployed on the FPGA side.…”

Section: Optimized Architecture Designmentioning

confidence: 99%

Accelerating Faceting Wide-Field Imaging Algorithm with FPGA for SKA Radio Telescope as a Vast Sensor Array

Song

Zhu

Nan

et al. 2020

Sensors

View full text Add to dashboard Cite

The SKA (Square Kilometer Array) radio telescope will become the most sensitive telescope by correlating a huge number of antenna nodes to form a vast array of sensors in a region over one hundred kilometers. Faceting, the wide-field imaging algorithm, is a novel approach towards solving image construction from sensing data where earth surface curves cannot be ignored. However, the traditional processor of cloud computing, even if the most sophisticated supercomputer is used, cannot meet the extremely high computation performance requirement. In this paper, we propose the design and implementation of high-efficiency FPGA (Field Programmable Gate Array) -based hardware acceleration of the key algorithm, faceting in SKA by focusing on phase rotation and gridding, which are the most time-consuming phases in the faceting algorithm. Through the analysis of algorithm behavior and bottleneck, we design and optimize the memory architecture and computing logic of the FPGA-based accelerator. The simulation and tests on FPGA are done to confirm the acceleration result of our design and it is shown that the acceleration performance we achieved on phase rotation is 20× the result of the previous work. We then further designed and optimized an efficient microstructure of loop unrolling and pipeline for the gridding accelerator, and the designed system simulation was done to confirm the performance of our structure. The result shows that the acceleration ratio is 5.48 compared to the result tested on software in gridding parts. Hence, our approach enables efficient acceleration of the faceting algorithm on FPGAs with high performance to meet the computational constraints of SKA as a representative vast sensor array.

show abstract

Section: Optimized Architecture Designmentioning

confidence: 99%

Accelerating Faceting Wide-Field Imaging Algorithm with FPGA for SKA Radio Telescope as a Vast Sensor Array

Song

Zhu

Nan

et al. 2020

Sensors

View full text Add to dashboard Cite

show abstract

“…The mainstream PCIe-based CPU-FPGA platforms use direct memory access (DMA) for an FPGA to access the data from a CPU. The FPGA typically needs a memory controller IP to read the data from the CPU's DRAM to its own DRAM through PCIe [30]. In fact, this communication is limited by restrict bandwidth in practice to make it impractical to implement full-speed acceleration even if we have a high throughput preprocessing accelerator on the FPGA side.…”

Section: Fpga-based Hardware Designmentioning

confidence: 99%

Astronomical Data Preprocessing Implementation Based on FPGA and Data Transformation Strategy for the FAST Telescope as a Giant CPS

et al. 2020

View full text Add to dashboard Cite

The emergence of cyber-physical-social systems (CPSS) as a novel paradigm has revolutionized the relationship between humans, computers and the physical environment. CPSS extend cyber-physical systems (CPS) to include the social domain, which introduces a challenge of massive data processing. As a typically giant CPS, the Five-hundred-meter Aperture Spherical radio Telescope (FAST), the world's largest filled-aperture radio telescope, generates massive volume of data which poses a huge storage problem that CPSS face likewise and requires real-time data compressing to reduce data storage and movement overhead. The recently introduced Bitshuffle preprocessing algorithm is a novel approach towards exploiting spatial redundancy incorporation to improve the compression ratio with a specific compressor. However, the existing high-performance CPU-based solutions cannot satisfy the performance requirement and power budget requirement simultaneously. In the paper, we propose the implementation of this algorithm on Field Programmable Gate Array (FPGA) and present an unique data transformation strategy to turn raw FAST data in classic FITS format into another format to support huge file sizes, i.e. Hierarchical Data Format (HDF5). Evaluation results show that our implementation can achieve 3.2Gbyte/s throughput which can be equipped with LZ4 compressor to be high performance compressor. This makes Bitshuffle on FPGAs a candidate for meeting the computational and energy efficiency constraints of radio telescopes and provide reference for CPSS facing the same situation. INDEX TERMS CPSS, astronomical data, FPGA, Bitshuffle, FITS, HDF5. I. INTRODUCTION With the cyber-physical system (CPS) technologies evolution [1], lots of interesting application domains have been explored ranging from industry automation to aeronautics and astronautics. Taking human social characteristics into account, an emerging computing paradigm called cyberphysical-social system (CPSS) has focused on the exploration The associate editor coordinating the review of this manuscript and approving it for publication was Zahir Tari.

show abstract

“…It extends MonetDB with user-defined functions in FPGAs, along with proposing a Centaur framework [97] that provides software APIs to bridge the gap between CPUs and FPGAs. Other research work studies the acceleration of different operators including compression [104], decompression [35], sort [146] and joins [49], etc.…”

Section: Co-processormentioning

confidence: 99%

In-memory database acceleration on FPGAs: a survey

et al. 2019

View full text Add to dashboard Cite

While FPGAs have seen prior use in database systems, in recent years interest in using FPGA to accelerate databases has declined in both industry and academia for the following three reasons. First, specifically for in-memory databases, FPGAs integrated with conventional I/O provide insufficient bandwidth, limiting performance. Second, GPUs, which can also provide high throughput, and are easier to program, have emerged as a strong accelerator alternative. Third, programming FPGAs required developers to have full-stack skills, from high-level algorithm design to low-level circuit implementations. The good news is that these challenges are being addressed. New interface technologies connect FPGAs into the system at mainmemory bandwidth and the latest FPGAs provide local memory competitive in capacity and bandwidth with GPUs. Ease of programming is improving through support of shared coherent virtual memory between the host and the accelerator, support for higher-level languages, and domain-specific tools to generate FPGA designs automatically. Therefore, this paper surveys using FPGAs to accelerate in-memory database systems targeting designs that can operate at the speed of main memory.

show abstract

High-Throughput Lossless Compression on Tightly Coupled CPU-FPGA Platforms

Cited by 20 publications

References 0 publications

Accelerating Faceting Wide-Field Imaging Algorithm with FPGA for SKA Radio Telescope as a Vast Sensor Array

Accelerating Faceting Wide-Field Imaging Algorithm with FPGA for SKA Radio Telescope as a Vast Sensor Array

Astronomical Data Preprocessing Implementation Based on FPGA and Data Transformation Strategy for the FAST Telescope as a Giant CPS

In-memory database acceleration on FPGAs: a survey

Contact Info

Product

Resources

About