An efficient sparse conjugate gradient solver using a Bene&amp;#x0161; permutation network

Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Luk

2016

Self Cite

Sparse matrix vector multiplication (SpMV) is an important kernel in many scientific applications. To improve the performance and applicability of FPGA based SpMV, we propose an approach for exploiting properties of the input matrix to generate optimised custom architectures. The architectures generated by our approach are between 3.8 to 48 times faster than the worst case architectures for each matrix, showing the benefits of instance specific design for SpMV.

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Cask

Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Luk

2016

Self Cite

“…In this regard, FPGAs may have a considerable advantage compared to general purpose architectures: the fine degree of customisation available can be used to directly and carefully orchestrate data movement on and off-chip resulting in good performance on the SpMV kernel [4]- [6]. Furthermore, when using FPGAs there is great potential for application and domain driven customisation: wordlengths, reduction circuits, memory controller infrastructure can all be optimised to direct resources to the most critical component [7]- [11].…”

Section: Introductionmentioning

confidence: 99%

“…[16] proposes one of the first parametric designs for floating point SpMV and demonstrates how the flexibility of FPGAs can be used to achieve good performance compared to general purpose systems. More recently, the focus has shifted to efficient use of on-chip memory resources and DRAM bandwidth utilisation [5], [7], [9]. Recently, compression techniques have been proposed to improve the performance on memory bound matrices [8], [17] The constant sparsity structure in the context of iterative methods has also been exploited to optimise FPGA architectures for SpMV [18].…”

mentioning

confidence: 99%

“…Recently, compression techniques have been proposed to improve the performance on memory bound matrices [8], [17] The constant sparsity structure in the context of iterative methods has also been exploited to optimise FPGA architectures for SpMV [18]. Static one-off pre-processing techniques are cost-effective for FPGA implementations if they can lead either to a simplified architecture [5], [7], [19] or reduced communication overhead [8], [17]. Linear or log-linear preprocessing techniques with good performance in practice, such as the method used in this work for extracting matrix properties, have been found to be effective.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Optimising Sparse Matrix Vector multiplication for large scale FEM problems on FPGA

2016 26th International Conference on Field Programmable Logic and Applications (FPL)

Luk

et al. 2016

Self Cite

Abstract-Sparse Matrix Vector multiplication (SpMV) is an important kernel in many scientific applications. In this work we propose an architecture and an automated customisation method to detect and optimise the architecture for block diagonal sparse matrices. We evaluate the proposed approach in the context of the spectral/hp Finite Element Method, using the local matrix assembly approach. This problem leads to a large sparse system of linear equations with block diagonal matrix which is typically solved using an iterative method such as the Preconditioned Conjugate Gradient. The efficiency of the proposed architecture combined with the effectiveness of the proposed customisation method reduces BRAM resource utilisation by as much as 10 times, while achieving identical throughput with existing state of the art designs and requiring minimal development effort from the end user. In the context of the Finite Element Method, our approach enables the solution of larger problems than previously possible, enabling the applicability of FPGAs to more interesting HPC problems.

dfesnippets: An Open-Source Library for Dataflow Acceleration on FPGAs

Lecture Notes in Computer Science

Arram

et al. 2017

Self Cite

Abstract. Highly-tuned FPGA implementations can achieve significant performance and power efficiency gains over general purpose hardware. However the limited development productivity has prevented mainstream adoption of FPGAs in many areas such as High Performance Computing. High level standard development libraries are increasingly adopted in improving productivity. We propose an approach for performance critical applications including standard library modules, benchmarking facilities and application benchmarks to support a variety of use-cases. We implement the proposed approach as an open-source library for a commercially available FPGA system and highlight applications and productivity gains.