Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication

Song, Linghao; Chi, Yuze; Sohrabizadeh, Atefeh; Choi, Y.; Lau, Jason; Cong, Jason

doi:10.1145/3490422.3502357

Cited by 41 publications

(18 citation statements)

References 61 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Serpens [20] and HiSparse [21] are two state-of-the-art SpMV accelerators on FPGAs and both target HBM. In both cases, however, the data type of their architecture is either fixed point or single-precision floating point.…”

Section: A Double-precision Spmv On Fpgas With Hbmmentioning

confidence: 99%

See 1 more Smart Citation

Accelerating SpMV on FPGAs Through Block-Row Compress: A Task-Based Approach

Oliver,

Álvarez,

Cervero

et al. 2023

2023 33rd International Conference on Field-Programmable Logic and Applications (FPL)

View full text Add to dashboard Cite

Sparse Matrix-Vector multiplication (SpMV), computing y = α • A × x + β • y where y, x are dense vectors, α, β two scalar constants, and A is a sparse matrix, is a key kernel in many HPC applications. It exhibits a kind of memory access that is extremely hard to perform efficiently, due to its random access. In this paper, we present a new approach to accelerate SpMV on FPGAs. As FPGAs lack a default memory hierarchy, they can adapt to specific applications better. Also, an increasing number of FPGAs include High Bandwidth Memory (HBM), making the SpMV problem especially appealing to tackle on these kind of devices. We define a new sparse matrix encoding format (b8c) and its corresponding SpMV implementation using OmpSs@FPGA and HLS. This format allows us to leverage many of the FPGA strengths for intensive data processing, such as data streaming, customizable datapaths widths, parallel memory access for off-chip memory in the case of multiple memory channels (like in HBM), parallel memory access for on-chip memory and pipelining. We tested our proposal for both DDR and HBM memories to show the adaptability and scalability of our design. The presented b8c SpMV implementation is able to achieve higher performance than the state-of-the-art FPGA implementation of SpMV over all the matrices in the data set, achieving 3.52x performance on average with a minimum of 1.82x and a maximum of 6.28x even when running at 75% the frequency.

show abstract

Section: A Double-precision Spmv On Fpgas With Hbmmentioning

confidence: 99%

“…There have been different proposals for FPGA-specific matrix encodings and algorithms [16], [20], [21], [23], [25]. Some of these even allow on-the-fly transformation from multiple formats inside the FPGA [19].…”

Section: B Sparse Matrix Representation Formatsmentioning

confidence: 99%

Accelerating SpMV on FPGAs Through Block-Row Compress: A Task-Based Approach

Oliver,

Álvarez,

Cervero

et al. 2023

2023 33rd International Conference on Field-Programmable Logic and Applications (FPL)

View full text Add to dashboard Cite

show abstract

“…Thus, the main challenge is to deal with the mismatch between the throughput of the transferred non-zero matrix elements (i.e., the throughput of the off-chip memory bandwidth) and the throughput of the vector buffer. Previous work [4][3] holds multiple copies of the input vector to increase the throughput of the on-chip vector buffer. However, they fail to explore data reuse of fetched vector values which can further compress sparse matrices.…”

Section: Spmv and Challengesmentioning

confidence: 99%

“…There are some works targeting HBMbased FPGAs. Serpens [4] proposes memory-centric processing engines to fully exploit the benefits of HBM. It proposes an index coalescing technique to improve URAM utilization and non-zero reordering to avoid URAM address conflicts.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Accelerating sparse matrix operations on FPGAs with on/off-chip memories

View full text Add to dashboard Cite

This thesis contains material from 5 paper(s) published or accepted in the following peer-reviewed journal and conferences in which I am listed as an author. Chapter 3 is partially published as Shiqing Li, Di Liu and Weichen Liu, "Optimized Data Reuse via Reordering for Sparse Matrix-Vector Multiplication on FP-GAs" in International Conference on Computer-Aided Design (ICCAD) 2021 and is partially published as Shiqing Li, Di Liu and Weichen Liu, "Efficient FPGAbased Sparse Matrix-Vector Multiplication with Data Reuse-aware Compression" in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 2023. The contributions of the co-authors are as follows: • Prof. Weichen Liu provided the initial idea of accelerating sparse-matrix dense-vector multiplication on embedded FPGAs. • I proposed the data reordering algorithm and the data reuse-aware compression. Meanwhile, I conduct experiments on FPGAs. Dr. Di Liu and Prof. Weichen Liu provided insightful comments about the idea and experiments. • I drafted the manuscript. Dr. Di Liu and Prof. Weichen Liu helped polish the manuscript. Chapter 4 is partially published as Shiqing Li and Weichen Liu, "Accelerating Gustavson-based SpMM on Embedded FPGAs with Element-wise Parallelism and Access Pattern-aware Caches", in Design, Automation and Test in Europe (DATE) 2023 and is partially published as Shiqing Li, Shuo Huai and Weichen Liu, "An Efficient Gustavson-based Sparse Matrix-matrix Multiplication Accelerator on Embedded FPGAs", in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 2023.The contributions of the co-authors are as follows:• Prof. Weichen Liu provided the initial idea of accelerating sparse-matrix sparse-matrix multiplication on embedded FPGAs. • I proposed to perform the Gustavson's algorithm with element-wise parallelism, access pattern-aware caches and optimized mergers. I further conduct experiments on FPGAs. Shuo Huai helped conduct experiments. • I drafted the manuscript. Prof. Weichen Liu and Shuo Huai helped polish the manuscript.

show abstract