2011
DOI: 10.1109/tvlsi.2009.2034167
|View full text |Cite
|
Sign up to set email alerts
|

High Performance and Area Efficient Flexible DSP Datapath Synthesis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
16
0
11

Year Published

2013
2013
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 15 publications
(27 citation statements)
references
References 25 publications
0
16
0
11
Order By: Relevance
“…The interconnection topology of PEA also has a great variety, which could be further categorized into two types [9] according to the interconnection scheme: 1) array-based PEA [6], [10]- [12]; and 2) row-based PEA [8], [13], [14]. The routing topologies in array-based CGRAs are usually mesh or mesh-plus, where each PE could connect to its four or eight neighbors.…”
Section: A Target Architecture Overview and Mapping Methodsmentioning
confidence: 99%
“…The interconnection topology of PEA also has a great variety, which could be further categorized into two types [9] according to the interconnection scheme: 1) array-based PEA [6], [10]- [12]; and 2) row-based PEA [8], [13], [14]. The routing topologies in array-based CGRAs are usually mesh or mesh-plus, where each PE could connect to its four or eight neighbors.…”
Section: A Target Architecture Overview and Mapping Methodsmentioning
confidence: 99%
“…As discussed in [10], row-based architectures exhibit low area complexity, high hardware utilization and relatively small configuration words, and we can optimize the kernel mapping and the final architecture instantiation jointly with tailored datapaths. Therefore we focus on row-based CGRAs, as shown in Fig.…”
Section: Target Architecturementioning
confidence: 99%
“…We calculate the TCL i of each cut S i independently, and then calculate the sum TCL sum = i TCL i . With all the discussion above, we could give the analytical form of performance metric in (10). Here p ∈ [1, P] (we assume that P 0 = 1 and P I = P) is the index number of the PEA operation, f is the clock frequency, I is the number of cuts, t p+1 = t p + Δ CFG,p + Δ LD,p + Δ EXE,p + Δ S T,p , and t 1 = 0.…”
Section: Performance Metricmentioning
confidence: 99%
“…High-performance flexible datapaths [2], [4], [6], [7], [10] have been proposed to efficiently map primitive or chained operations found in the initial data-flow graph (DFG) of a kernel. The templates of complex chained operations are either extracted directly from the kernel's DFG [10] or specified in a predefined behavioral template library [4], [6], [7].…”
Section: Introductionmentioning
confidence: 99%
“…The templates of complex chained operations are either extracted directly from the kernel's DFG [10] or specified in a predefined behavioral template library [4], [6], [7]. Design decisions on the accelerator's datapath highly impact its efficiency.…”
Section: Introductionmentioning
confidence: 99%