2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES) 2013
DOI: 10.1109/cases.2013.6662507
|View full text |Cite
|
Sign up to set email alerts
|

Compiled multithreaded data paths on FPGAs for dynamic workloads

Abstract: Abstract-Hardware supported multithreading can mask memory latency by switching the execution to ready threads, which is particularly effective on irregular applications. FPGAs provide an opportunity to have multithreaded data paths customized toeach individual application. In this paper we describe the compiler generation of these hardware structures from a C subset targeting a Convey HC-2ex machine. We describe how this compilation approach differs from other C to HDL compilers. We use the compiler to genera… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 16 publications
(6 citation statements)
references
References 21 publications
0
6
0
Order By: Relevance
“…7, the proposed accelerator can obtain higher performance for most of the test matrices, compared with the implementations on the Convey HC2ex platform with four Virtex-6 LX760 FPGAs [13], HC-1 [12] and Tesla S1070 [7]. With the number of the nonzero block in one block row and the density of one increasing, the performance improvement can be higher.…”
Section: Performance Comparisonmentioning
confidence: 96%
See 1 more Smart Citation
“…7, the proposed accelerator can obtain higher performance for most of the test matrices, compared with the implementations on the Convey HC2ex platform with four Virtex-6 LX760 FPGAs [13], HC-1 [12] and Tesla S1070 [7]. With the number of the nonzero block in one block row and the density of one increasing, the performance improvement can be higher.…”
Section: Performance Comparisonmentioning
confidence: 96%
“…K. Nagar, et al [12] implemented SpMV for large-scale sparse matrices on the Convey HC-1 with a novel streaming multiply-accumulator and local vector cache. Further, A hardware multithreaded implementation of SpMV on the Convey HC2ex, which makes use of multiple outstanding memory requests to mask the long latencies and multiple Computation Engines to process multiple rows in parallel [13]. However, the performance improvement of the above two implementations mainly depend on the high bandwidth and multiple memory controllers, which are greatly excessive of other platforms.…”
Section: Related Workmentioning
confidence: 99%
“…Synthesis of multithreaded accelerators. Halstead and Najjar extend the ROCCC HLS compiler with the CHAT methodology to generate temporally multithreaded accelerators starting from loops constructs [23]. However, they do not address atomic memory operations and focus on the simple case study of pointer chasing.…”
Section: Related Workmentioning
confidence: 99%
“…Another idea is to maximize the utilization of a single hardware accelerator, extending its functionality to support hardware threads and hide latencies in pipelined loops. Halstead and Najjar extend the ROCCC HLS compiler to generate multi-threaded accelerators starting from loops constructs [22]. The programming model is similar to OpenMP for loops, and the generated architecture uses hardware context-switches to hide variable latencies due to memory accesses in irregular applications.…”
Section: High-level Synthesis Of Multi-threaded Programsmentioning
confidence: 99%