Placement-and-routing-based register allocation for coarse-grained reconfigurable arrays

Sutter, Bjorn De; Coene, Paul; Aa, Tom Vander; Mei, Bingfeng

doi:10.1145/1379023.1375678

Cited by 14 publications

(25 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Using (3-4), gr1+0 will be self-updated as gr1+=4. Applying to (3)(4)(5), an execution unit for the add is configured with a self-forwarding. According to (4-1), since results of add and sub can be supplied to execution units in stage 1 by using a forwarding path from EXEC output registers, prop skp[gr1], prop skp[gr3] and prop skp[z] in stage 1 are set to one at step 1.…”

Section: Implementation and Examplementioning

confidence: 99%

“…A research by Hrishikesh, et al [11] indicated that an optimal delay for one stage is around 6 to 8 FO4. Viji Srinivasan, et al [12] showed that an optimal design point based on a power-performance metric ((Billions of Instructions Per Second) 3 /Watt) is 18 FO4 per pipeline stage. It is our future work to find optimal depth of pipeline stages for LAPP.…”

Section: Circuit Area and Delay Timementioning

confidence: 99%

“…ADRES [3] is composed of a VLIW engine and an accelerator engine. The VLIW engine is used to support a VLIW-like programming model for legacy code.…”

Section: Related Workmentioning

confidence: 99%

“…Alternatively, many studies have proposed relieving the development cycle problem of ASICs by using techniques such as Many-Core Architectures (MCAs) [1], [2] and Coarse-Grained Reconfigurable Architectures (CGRAs) [3]. MCAs are composed of general purpose processors (GPPs), which are able to execute conventional Instruction Set Architecture (ISA) based machine instructions and thus provide high-programmability with the support of existing compilers.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

An Instruction Mapping Scheme for FU Array Accelerator

Yoshimura

Iwakami

Nakada

et al. 2011

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYRecently, we have proposed using a Linear Array Pipeline Processor (LAPP) to improve energy efficiency for various workloads such as image processing and to maintain programmability by working on VLIW codes. In this paper, we proposed an instruction mapping scheme for LAPP to fully exploit the array execution of functional units (FUs) and bypass networks by a mapper to fit the VLIW codes onto the FUs. The mapping can be finished within multi-cycles during a data prefetch before the array execution of FUs. According to an HDL based implementation, the hardware required for mapping scheme is 84% of the cost introduced by a baseline method. In addition, the proposed mapper can further help to shrink the size of array stage, as our results show that their combination becomes 88% of the baseline model in area.

show abstract

Section: Implementation and Examplementioning

confidence: 99%

Section: Circuit Area and Delay Timementioning

confidence: 99%

“…ADRES [3] is composed of a VLIW engine and an accelerator engine. The VLIW engine is used to support a VLIW-like programming model for legacy code.…”

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

An Instruction Mapping Scheme for FU Array Accelerator

Yoshimura

Iwakami

Nakada

et al. 2011

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

show abstract

“…Modulo scheduling techniques for CGRAs [15,17,20,39,45,46] only schedule loops that are free of control flow transfers. Hence any loop body that contains conditional statements first needs to be if-converted into hyperblocks by means of predication [36].…”

Section: Predicationmentioning

confidence: 99%

Coarse-Grained Reconfigurable Array Architectures

Sutter

Raghavan

Lambrechts

2010

Handbook of Signal Processing Systems

Self Cite

View full text Add to dashboard Cite

Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code.

show abstract