Nuno Paulino scite author profile

et al. 2013

IEEE Trans. Ind. Inf.

This paper presents a novel approach to accelerate program execution by mapping repetitive traces of executed instructions, called Megablocks, to a runtime reconfigurable array of functional units. An offline tool suite extracts Megablocks from microprocessor instruction traces and generates a Reconfigurable Processing Unit (RPU) tailored for the execution of those Megablocks. The system is able to transparently movebcomputations from the microprocessor to the RPU at runtime. A prototype implementation of the system using a cacheless MicroBlaze microprocessor running code located in external memory reaches speedups from to for a set of 14 benchmark kernels. For a system setup which maximizes microprocessor performance by having the application code located in internal block RAMs, speedups from to were estimated.

show abstract

Transparent Runtime Migration of Loop-Based Traces of Processor Instructions to Reconfigurable Processing Units

Bispo

International Journal of Reconfigurable Computing

et al. 2013

e ability to map instructions running in a microprocessor to a recon�gurable processing unit (RPU), acting as a coprocessor, enables the runtime acceleration of applications and ensures code and possibly performance portability. In this work, we focus on the mapping of loop-based instruction traces (called Megablocks) to RPUs. e proposed approach considers offline partitioning and mapping stages without ignoring their future runtime applicability. We present a toolchain that automatically extracts speci�c trace-based loops, called Megablocks, from MicroBlaze instruction traces and generates an RPU for executing those loops. Our hardware infrastructure is able to move loop execution from the microprocessor to the RPU transparently, at runtime, and without changing the executable binaries. e toolchain and the system are fully operational. ree FPGA implementations of the system, differing in the hardware interfaces used, were tested and evaluated with a set of 15 application kernels. Speedups ranging from 1.26× to 3.69× were achieved for the best alternative using a MicroBlaze processor with local memory.

show abstract

Generation of Customized Accelerators for Loop Pipelining of Binary Instruction Traces

Ferreira

2017

IEEE Trans. VLSI Syst.

From Instruction Traces to Specialized Reconfigurable Arrays

Bispo

et al. 2011