2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) 2017
DOI: 10.1109/fccm.2017.33
|View full text |Cite
|
Sign up to set email alerts
|

Using Runahead Execution to Hide Memory Latency in High Level Synthesis

Abstract: Reads and writes to global data in off-chip RAM can limit the performance achieved with HLS tools, as each access takes multiple cycles and usually blocks progress in the application state machine. This can be combated by using data prefetchers, which hide access time by predicting the next memory access and loading it into a cache before it's required. Unfortunately, current prefetchers are only useful for memory accesses with known regular patterns, such as walking arrays, and are ineffective for those that … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 12 publications
0
5
0
Order By: Relevance
“…The generated Boogie program only describes the program behaviour of the partitioned memory for the memory arbitration problem. EASY uses the pre-existing slicing tool by Fleming and Thomas [31] to automatically extract the memory behaviour from the input code. The sliced code is a list of instructions that affects the partitioned memory access, disregarding all other irrelevant instructions in the thread function.…”
Section: Generating a Boogie Programmentioning
confidence: 99%
“…The generated Boogie program only describes the program behaviour of the partitioned memory for the memory arbitration problem. EASY uses the pre-existing slicing tool by Fleming and Thomas [31] to automatically extract the memory behaviour from the input code. The sliced code is a list of instructions that affects the partitioned memory access, disregarding all other irrelevant instructions in the thread function.…”
Section: Generating a Boogie Programmentioning
confidence: 99%
“…RELISH (Runahead Execution of Load Instructions via Sliced Hardware) [36] is a LegUp HLS optimization pass which constructs a "pslice" (precomputation slice) for an accelerator. A "pslice" is an executable portion of an original program which only includes certain operations, in this case every long latency global load in the accelerated function.…”
Section: Taxonomy Of Existing Projectsmentioning
confidence: 99%
“…Unrolling a loop increases the number of operations that use the same memory, turning memories into bottlenecks since the compiler does not infer more memories or more ports to the existing ones. To avoid this problem, memory buffers [34,35], partition [36], or run-ahead [37] techniques can be applied.…”
Section: Scalabilitymentioning
confidence: 99%