MAMPSx: A design framework for rapid synthesis of predictable heterogeneous MPSoCs

Fernando, Shakith; Siyoum, Firew; He, Yifan; Kumar, Akash; Corporaal, Henk

doi:10.1109/rsp.2013.6683970

Cited by 6 publications

(2 citation statements)

References 17 publications

(12 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For SPINE, which uses the same skeleton template for all values of P , we observe a much flatter scaling. It also outperforms a manual implementation (described in [16]) that has an input buffer with equal width as the AXI interface and requires five cycles (4 load, 1 execute, non-pipelined) per execution of 128 parallel PEs. If this manual accelerator would perform only 32-parallel operations, only a single load-cycle would be required, and it would be at a point similar to P = 32 for SPINE.…”

Section: Discussionmentioning

confidence: 99%

SPINE: From C loop-nests to highly efficient accelerators using Algorithmic Species

Wijtvliet

Fernando

Corporaal

2015

2015 25th International Conference on Field Programmable Logic and Applications (FPL)

Self Cite

View full text Add to dashboard Cite

In modern embedded systems, heterogeneous architectures are crucial in achieving desired performance requirements under area and energy constraints. Many of these systems combine a multi-processor system-on-chip and a Field Programmable Gate Array to enable hardware acceleration. Although the introduction of High-Level Synthesis significantly reduced the complexity of utilizing these systems, a programmer is still required to have expert knowledge of both the High-Level Synthesis tool and the target hardware and to perform time consuming manual iterations to achieve efficient implementations. In this paper we present SPINE, a design flow for automatic generation of efficient hardware accelerators based on Algorithmic Species. SPINE allows the designer to focus on the algorithm by automatically applying hardware specific optimizations and parallelization techniques to the design. As a case study, we present a design space exploration of nine different loop-nests used in image processing kernels and show how SPINE rapidly generates multiple area-performance trade-offs. Furthermore, we compare our results the state of the art and show that SPINE is a promising direction for accelerator generation as the average performance and area improvement with SPINE are respectively 107% and 75% over the state of the art.

show abstract

Section: Discussionmentioning

confidence: 99%

SPINE: From C loop-nests to highly efficient accelerators using Algorithmic Species

Wijtvliet

Fernando

Corporaal

2015

2015 25th International Conference on Field Programmable Logic and Applications (FPL)

Self Cite

View full text Add to dashboard Cite

show abstract

“…The implementation target was the Zynq XC7Z045 FPGA device. Figure 5 shows the area-performance trade-offs for the three kernels using: (1) a single design point from a hand-coded RTL implementation [12], (2) a single design point from the Vivado OpenCV video library [9], (3) multiple design points (P 0 = 1..128) generated from naive C through HLS (HLS-C), and (4) multiple design points generated from C' through HLS using (AS) 2 . The post place-and-route results are presented.…”

Section: Methodsmentioning

confidence: 99%

(AS)²: Accelerator Synthesis using Algorithmic Skeletons for Rapid Design Space Exploration

Fernando¹,

Wijtvliet²,

Nugteren³

et al. 2015

Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition (DATE), 2015

Self Cite

View full text Add to dashboard Cite

Abstract-Hardware accelerators in heterogeneous multiprocessor system-on-chips are becoming popular as a means of meeting performance and energy efficiency requirements of modern embedded systems. Current design methods for accelerator synthesis, such as High-Level Synthesis, are not fully automated. Therefore, time consuming manual iterations are required to explore efficient accelerator alternatives: the programmer is still required to think in terms of the underlying architecture. In this paper, we present (AS) 2 : a design flow for Accelerator Synthesis using Algorithmic Skeletons. Skeletonization separates the structure of a parallel computation from an algorithms' functionality, enabling efficient implementations without requiring the programmer to have hardware knowledge. We define three such skeletons (for three image processing kernels) enabling FPGA specific parallelization techniques and optimizations. As a case study, we present a design space exploration of these skeletons and show how multiple design points with area-performance trade-offs for the accelerators can be efficiently and rapidly synthesized. We show that (AS) 2 is a promising direction for accelerator synthesis as it generates a pareto front of 8 design points in under half an hour for each of the three image processing kernels.

show abstract

MAMPSX: A demonstration of rapid, predictable HMPSOC synthesis

Fernando

Wijtvliet

Siyoum

et al. 2013

2013 23rd International Conference on Field Programmable Logic and Applications

Self Cite

View full text Add to dashboard Cite

Heterogeneous Multiprocessor systems-on-chip (HMPSoC) are becoming popular as a means of meeting energy efficiency requirements of modern embedded systems. However, as these HMPSoCs run multimedia applications as well, they also need to meet realtime requirements. Designing HMPSoCs with predictable timing behavior is a key challenge, as the current design methods for these platforms are semi-automated, non-predictable, or support limited heterogeneity.In this demonstration, we present a design framework to rapidly generate and implement predictable HMPSoC designs. It takes the application specifications and the architecture model as input and generates the entire HMPSoC, for FPGA prototyping, that meets the throughput constraints of the application. We also present results of a case study that computes the performance-power tradeoffs of an industrial vision application. A tool-chain targeting the Xilinx Zynq FPGA is also presented.

show abstract

MAMPSx: A design framework for rapid synthesis of predictable heterogeneous MPSoCs

Cited by 6 publications

References 17 publications

SPINE: From C loop-nests to highly efficient accelerators using Algorithmic Species

SPINE: From C loop-nests to highly efficient accelerators using Algorithmic Species

(AS)²: Accelerator Synthesis using Algorithmic Skeletons for Rapid Design Space Exploration

MAMPSX: A demonstration of rapid, predictable HMPSOC synthesis

Contact Info

Product

Resources

About

MAMPSx: A design framework for rapid synthesis of predictable heterogeneous MPSoCs

Cited by 6 publications

References 17 publications

SPINE: From C loop-nests to highly efficient accelerators using Algorithmic Species

SPINE: From C loop-nests to highly efficient accelerators using Algorithmic Species

(AS)2: Accelerator Synthesis using Algorithmic Skeletons for Rapid Design Space Exploration

MAMPSX: A demonstration of rapid, predictable HMPSOC synthesis

Contact Info

Product

Resources

About

(AS)²: Accelerator Synthesis using Algorithmic Skeletons for Rapid Design Space Exploration