2013 International Symposium on Rapid System Prototyping (RSP) 2013
DOI: 10.1109/rsp.2013.6683970
|View full text |Cite
|
Sign up to set email alerts
|

MAMPSx: A design framework for rapid synthesis of predictable heterogeneous MPSoCs

Abstract: Abstract-HeterogeneousMultiprocessor System-on-Chips (HMPSoC) are becoming popular as a means of meeting energy efficiency requirements of modern embedded systems. However, as these HMPSoCs run multimedia applications as well, they also need to meet real-time requirements. Designing these predictable HMPSoCs is a key challenge, as the current design methods for these platforms are either semi-automated, non-predictable, or have limited heterogeneity.In this paper, we propose a design framework to generate and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2013
2013
2016
2016

Publication Types

Select...
6

Relationship

3
3

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 17 publications
(12 reference statements)
0
2
0
Order By: Relevance
“…For SPINE, which uses the same skeleton template for all values of P , we observe a much flatter scaling. It also outperforms a manual implementation (described in [16]) that has an input buffer with equal width as the AXI interface and requires five cycles (4 load, 1 execute, non-pipelined) per execution of 128 parallel PEs. If this manual accelerator would perform only 32-parallel operations, only a single load-cycle would be required, and it would be at a point similar to P = 32 for SPINE.…”
Section: Discussionmentioning
confidence: 99%
“…For SPINE, which uses the same skeleton template for all values of P , we observe a much flatter scaling. It also outperforms a manual implementation (described in [16]) that has an input buffer with equal width as the AXI interface and requires five cycles (4 load, 1 execute, non-pipelined) per execution of 128 parallel PEs. If this manual accelerator would perform only 32-parallel operations, only a single load-cycle would be required, and it would be at a point similar to P = 32 for SPINE.…”
Section: Discussionmentioning
confidence: 99%
“…The implementation target was the Zynq XC7Z045 FPGA device. Figure 5 shows the area-performance trade-offs for the three kernels using: (1) a single design point from a hand-coded RTL implementation [12], (2) a single design point from the Vivado OpenCV video library [9], (3) multiple design points (P 0 = 1..128) generated from naive C through HLS (HLS-C), and (4) multiple design points generated from C' through HLS using (AS) 2 . The post place-and-route results are presented.…”
Section: Methodsmentioning
confidence: 99%