High Performance and Area Efficient Flexible DSP Datapath Synthesis

Xydis, Sotirios; Economakos, George; Soudris, Dimitrios; Pekmestzi, Kiamal

doi:10.1109/tvlsi.2009.2034167

Cited by 15 publications

(27 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The interconnection topology of PEA also has a great variety, which could be further categorized into two types [9] according to the interconnection scheme: 1) array-based PEA [6], [10]- [12]; and 2) row-based PEA [8], [13], [14]. The routing topologies in array-based CGRAs are usually mesh or mesh-plus, where each PE could connect to its four or eight neighbors.…”

Section: A Target Architecture Overview and Mapping Methodsmentioning

confidence: 99%

Optimizing Spatial Mapping of Nested Loop for Coarse-Grained Reconfigurable Architectures

Liu

Yin

Peng

et al. 2015

IEEE Trans. VLSI Syst.

View full text Add to dashboard Cite

Coarse-grained reconfigurable architectures (CGRAs) have drawn increasing attention due to their flexibility and efficiency. Loops in applications are often mapped onto CGRAs for acceleration, and the mapping of loops onto CGRA is quite a challenging work due to the parallel execution paradigm and constrained hardware resource. To map loops onto CGRAs efficiently, it is important to transform loops into pieces that obey hardware resource constraints with less overhead (e.g., communication and configuration overhead). In this paper, we tackle this problem by establishing a performance optimization problem, including loop transformation and backend placing and routing. A novel searching strategy is also designed to find the optimal result efficiently. Finally, we built a complete flow of mapping loop nests onto CGRA. Experiment results on most kernels of the Polybench show that our proposed approach can improve the performance of the kernels by 42% on average, as compared with the state-of-the-art methods. The runtime complexity of our approach is also acceptable.

show abstract

Section: A Target Architecture Overview and Mapping Methodsmentioning

confidence: 99%

Optimizing Spatial Mapping of Nested Loop for Coarse-Grained Reconfigurable Architectures

Liu

Yin

Peng

et al. 2015

IEEE Trans. VLSI Syst.

View full text Add to dashboard Cite

show abstract

“…As discussed in [10], row-based architectures exhibit low area complexity, high hardware utilization and relatively small configuration words, and we can optimize the kernel mapping and the final architecture instantiation jointly with tailored datapaths. Therefore we focus on row-based CGRAs, as shown in Fig.…”

Section: Target Architecturementioning

confidence: 99%

“…We calculate the TCL i of each cut S i independently, and then calculate the sum TCL sum = i TCL i . With all the discussion above, we could give the analytical form of performance metric in (10). Here p ∈ [1, P] (we assume that P 0 = 1 and P I = P) is the index number of the PEA operation, f is the clock frequency, I is the number of cuts, t p+1 = t p + Δ CFG,p + Δ LD,p + Δ EXE,p + Δ S T,p , and t 1 = 0.…”

Section: Performance Metricmentioning

confidence: 99%

Battery-Aware Loop Nests Mapping for CGRAs

Peng

Yin

Liu

et al. 2015

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYCoarse-grained Reconfigurable Architecture (CGRA) is a promising mobile computing platform that provides both high performance and high energy efficiency. In an application, loop nests are usually mapped onto CGRA for further acceleration, so optimizing the mapping is an important goal for design of CGRAs. Moreover, obviously almost all of mobile devices are powered by batteries, how to reduce energy consumption also becomes one of primary concerns in using CGRAs. This paper makes three contributions: a) Proposing an energy consumption model for CGRA; b) Formulating loop nests mapping problem to minimize the battery charge loss; c) Extract an efficient heuristic algorithm called BPMap. Experiment results on most kernels of the benchmarks and real-life applications show that our methods can improve the performance of the kernels and lower the energy consumption.

show abstract

“…High-performance flexible datapaths [2], [4], [6], [7], [10] have been proposed to efficiently map primitive or chained operations found in the initial data-flow graph (DFG) of a kernel. The templates of complex chained operations are either extracted directly from the kernel's DFG [10] or specified in a predefined behavioral template library [4], [6], [7].…”

Section: Introductionmentioning

confidence: 99%

“…The templates of complex chained operations are either extracted directly from the kernel's DFG [10] or specified in a predefined behavioral template library [4], [6], [7]. Design decisions on the accelerator's datapath highly impact its efficiency.…”

Section: Introductionmentioning

confidence: 99%

Flexible DSP Accelerator Architecture Exploiting Carry-Save Arithmetic

Tsoumanis

Xydis

Zervakis

et al. 2016

IEEE Trans. VLSI Syst.

Self Cite

View full text Add to dashboard Cite

Hardware acceleration has been proved an extremely promising implementation strategy for the digital signal processing (DSP) domain. Rather than adopting a monolithic application-specific integrated circuit design approach, in this brief, we present a novel accelerator architecture comprising flexible computational units that support the execution of a large set of operation templates found in DSP kernels. We differentiate from previous works on flexible accelerators by enabling computations to be aggressively performed with carry-save (CS) formatted data. Advanced arithmetic design concepts, i.e., recoding techniques, are utilized enabling CS optimizations to be performed in a larger scope than in previous approaches. Extensive experimental evaluations show that the proposed accelerator architecture delivers average gains of up to 61.91% in area-delay product and 54.43% in energy consumption compared with the state-of-art flexible datapaths.

show abstract

High Performance and Area Efficient Flexible DSP Datapath Synthesis

Cited by 15 publications

References 25 publications

Optimizing Spatial Mapping of Nested Loop for Coarse-Grained Reconfigurable Architectures

Optimizing Spatial Mapping of Nested Loop for Coarse-Grained Reconfigurable Architectures

Battery-Aware Loop Nests Mapping for CGRAs

Flexible DSP Accelerator Architecture Exploiting Carry-Save Arithmetic

Contact Info

Product

Resources

About