2013
DOI: 10.1007/978-3-642-36812-7_12
|View full text |Cite
|
Sign up to set email alerts
|

Architecture for Transparent Binary Acceleration of Loops with Memory Accesses

Abstract: This paper presents an extension to a hardware/software system architecture in which repetitive instruction traces, called Megablocks, are accelerated by a Reconfigurable Processing Unit (RPU). This scheme is supported by a custom toolchain able to automatically generate a RPU tailored for the execution of one or more Megablocks detected offline. Switching between hardware and software execution is done transparently, without modifications to source code or executable binaries. Our approach has been evaluated … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
3
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 9 publications
0
3
0
Order By: Relevance
“…The current version of the VHDL generation tool does not deal with Megablocks with memory operations. Note that non-pipelined Megablocks have been already tested and evaluated using an FPGA board (see [14]), and there is recent work focused on adding support to memory operations [16].…”
Section: Resultsmentioning
confidence: 99%
“…The current version of the VHDL generation tool does not deal with Megablocks with memory operations. Note that non-pipelined Megablocks have been already tested and evaluated using an FPGA board (see [14]), and there is recent work focused on adding support to memory operations [16].…”
Section: Resultsmentioning
confidence: 99%
“…This work focuses on the third point, transparent control flow transfer in the context of HPC platforms-specifically, the Xeon + FPGA platform [9]. Several proposals have been made for a mechanism capable of transparently transferring control flow between a CPU and an accelerator, but most of them target embedded platforms [10][11][12]. Embedded platforms benefit from a heterogeneous computational model, but allow much more fine-grained control, leading to approaches that are not directly transferable to HPC.…”
mentioning
confidence: 99%
“…An input trigger is then sent to the BERET co-processor along with some configuration data. Another approach is proposed in [12], where a local memory bus injector is responsible for triggering the transfer of the control flow. This block watches the instruction bus, and when it detects the start of a hotspot, it modifies the instruction flow to invoke a subroutine that takes over the transfer.…”
mentioning
confidence: 99%