2020 IEEE/ACM International Workshop on Heterogeneous High-Performance Reconfigurable Computing (H2RC) 2020
DOI: 10.1109/h2rc51942.2020.00007
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating FPGA Accelerator Performance with a Parameterized OpenCL Adaptation of Selected Benchmarks of the HPCChallenge Benchmark Suite

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 19 publications
(11 citation statements)
references
References 10 publications
1
8
0
Order By: Relevance
“…We also support this parameter in the newly added benchmarks. A more detailed description of the build process is given in our previous work [13] and in the online documentation. 2 Different hardware interfaces can be utilized for inter-FPGA communication with recent FPGA boards.…”
Section: Parallel Implementation Of Hpc Challenge Benchmarks For Fpgamentioning
confidence: 99%
See 3 more Smart Citations
“…We also support this parameter in the newly added benchmarks. A more detailed description of the build process is given in our previous work [13] and in the online documentation. 2 Different hardware interfaces can be utilized for inter-FPGA communication with recent FPGA boards.…”
Section: Parallel Implementation Of Hpc Challenge Benchmarks For Fpgamentioning
confidence: 99%
“…The base implementation of the HPL benchmark uses a similar two-leveled blocked approach than the GEMM benchmark described in [13]. Thus, it uses two parameters to specify the block sizes of the local memory buffers and of the compute units as described in Table 4.…”
Section: Intelmentioning
confidence: 99%
See 2 more Smart Citations
“…As we are mimicking the code structure of previous GPU and CPU implementations this means that we need to execute 2 unaligned loads and stores. Related work [34] evaluating fully random access patterns with nonaligned loads and stores has shown a performance of around 60M transactions per second per DDR memory bank on the Stratix 10 architecture, corresponding to around 5 clock cycles per pair of read and write operations relative to the 300MHz of the memory interface. In the gather-scatter operation for SEM, the pattern is not fully random, but also not strictly pairwise, as multiple reads have to be completed before the sums are written back to the respective locations.…”
Section: Maximizing Memory Bandwidthmentioning
confidence: 99%