VLIW-SCORE: Beyond C for sequential control of SPICE FPGA acceleration

Kapre, Nachiket; DeHon, André

doi:10.1109/fpt.2011.6132678

Cited by 23 publications

(14 citation statements)

References 18 publications

(21 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although a hard processors such as the PowerPC CPUs in Virtex-2 Pro and the ARM cores in the newer Zynq SoCs can o↵er better performance than an equivalent soft processor, they are inappropriate for situations where lightweight control and co-ordination [13] are required. Their fixed position in the FPGA fabric can also complicate design, and they demand supporting infrastructure for logic interfacing.…”

Section: Related Workmentioning

confidence: 98%

“…Processors find extensive use within FPGA systems, from management of system execution and interfacing, to implementation of iterative algorithms outside of the performancecritical datapath [13]. In recent work, soft processors have been demonstrated as a viable abstraction of hardware resources, allowing multi-processor systems to be built and programmed easily.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

On Data Forwarding in Deeply Pipelined Soft Processors

Yan

Fahmy

Kapre

2015

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Self Cite

View full text Add to dashboard Cite

We can design high-frequency soft-processors on FPGAs that exploit deep pipelining of DSP primitives, supported by selective data forwarding, to deliver up to 25% performance improvements across a range of benchmarks. Pipelined, inorder, scalar processors can be small and lightweight but su↵er from a large number of idle cycles due to dependency chains in the instruction sequence. Data forwarding allows us to more deeply pipeline the processor stages while avoiding an associated increase in the NOP cycles between dependent instructions. Full forwarding can be prohibitively complex for a lean soft processor, so we explore two approaches: an external forwarding path around the DSP block execution unit in FPGA logic and using the intrinsic loopback path within the DSP block primitive. We show that internal loopback improves performance by 5% compared to external forwarding, and up to 25% over no data forwarding. The result is a processor that runs at a frequency close to the fabric limit of 500 MHz, but without the significant dependency overheads typical of such processors.

show abstract

Section: Related Workmentioning

confidence: 98%

Section: Introductionmentioning

confidence: 99%

On Data Forwarding in Deeply Pipelined Soft Processors

Yan

Fahmy

Kapre

2015

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Self Cite

View full text Add to dashboard Cite

show abstract

“…Stream graphs have been mapped to different architectures, including the Cell processor [20], GP-GPUs [12,21], FPGA [22,23], and other reconfigurable architectures [24]. Those techniques often focus on exploiting task-level parallelism (TLP) and balancing workloads among multiple processors.…”

Section: Related Workmentioning

confidence: 99%

Optimizing stream program performance on CGRA-based systems

Lee

Nguyễn

Lee

2015

Proceedings of the 52nd Annual Design Automation Conference

View full text Add to dashboard Cite

Coarse-Grained Reconfigurable Architectures (CGRAs), often used as coprocessors for DSP and multimedia kernels, can deliver highly energy-efficient execution for compute-intensive kernels. Simultaneously, stream applications, which consist of many actors and channels connecting them, can provide natural representations for DSP applications, and therefore be a good match for CGRAs. We present our results of mapping DSP applications written in StreamIt language to CGRAs, along with our mapping flow. One important challenge in mapping is how to manage the multitude of kernels in the application for the limited local memory of a CGRA, for which we present a novel integer linear programming-based solution. Our evaluation results demonstrate that our software and hardware optimizations can help generate highly efficient mapping of stream applications to CGRAs, enabling far more energy-efficient executions (7× worse to 50× better) compared to using state-of-the-art GP-GPUs.

show abstract

“…[4] describe a VLIW soft processor that can be dynamically reconfigured (using Xilinx partial reconfig support) to be either a single 4-wide core or two 2-wide cores. Most closely related is VLIW-SCORE [1], a VLIW, pipelined architecture for utilizing floating point units, that compiler-schedules operations via software pipelining. One key difference vs TILT is storage organization: VLIW-SCORE organizes storage as a pair of operand memories in front of each functional unit, with a time-multiplexed network connecting functional unit outputs to operand-memory inputs; in contrast, TILT organizes storage as banks of memory having contiguous per-thread address spaces.…”

Section: Introductionmentioning

confidence: 99%

TILT: A multithreaded VLIW soft processor family

Ovtcharov

Tili

Steffan

2013

2013 23rd International Conference on Field Programmable Logic and Applications

View full text Add to dashboard Cite

Fig. 1. The TILT Architecture, consisting of a scratchpad, banked, multi-ported memory system and FUs connected by crossbar networks. ABSTRACTWe propose TILT, an FPGA-based compute engine designed to highly-utilize multiple, varied, and deeply-pipelined functional units by leveraging thread-level parallelism and static compiler analysis and scheduling. For this work we focus on deeply-pipelined floating-point functional units of widely-varying latency, executing Hodgkin-Huxley neuron simulation as an example application, compiled with our LLVM-based scheduler. Targeting a Stratix IV FPGA, we explore architectural trade-offs by measuring area and throughput for designs with varying numbers of functional units, thread contexts, and memory banks.

show abstract

VLIW-SCORE: Beyond C for sequential control of SPICE FPGA acceleration

Cited by 23 publications

References 18 publications

On Data Forwarding in Deeply Pipelined Soft Processors

On Data Forwarding in Deeply Pipelined Soft Processors

Optimizing stream program performance on CGRA-based systems

TILT: A multithreaded VLIW soft processor family

Contact Info

Product

Resources

About