Proceedings of the ACM SIGPLAN/SIGBED 2010 Conference on Languages, Compilers, and Tools for Embedded Systems 2010
DOI: 10.1145/1755888.1755892
|View full text |Cite
|
Sign up to set email alerts
|

Operation and data mapping for CGRAs with multi-bank memory

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2011
2011
2018
2018

Publication Types

Select...
4
2
2

Relationship

3
5

Authors

Journals

citations
Cited by 12 publications
(8 citation statements)
references
References 15 publications
0
8
0
Order By: Relevance
“…For the CGRA coprocessor, we assume that its input/output are provided on its local memory, which may be multi-banked to provide high bandwidth toward the processing elements in the coprocessor [Bougard et al 2008;Kim et al 2010]. Recent CGRA coprocessors [Bougard et al 2008;Mei et al 2004] can access any data on its local memory using addressed load/store operations, but the addresses must be linear (or at least easily computable using arithmetic operations only).…”
Section: System Architecturementioning
confidence: 99%
“…For the CGRA coprocessor, we assume that its input/output are provided on its local memory, which may be multi-banked to provide high bandwidth toward the processing elements in the coprocessor [Bougard et al 2008;Kim et al 2010]. Recent CGRA coprocessors [Bougard et al 2008;Mei et al 2004] can access any data on its local memory using addressed load/store operations, but the addresses must be linear (or at least easily computable using arithmetic operations only).…”
Section: System Architecturementioning
confidence: 99%
“…Recent studies [16], [17] propose various solutions to reduce the data transfer overhead between the system memory, a local memory, and the processing elements from the architecture and compiler perspective. In particular, the ADRES architecture [18] allows tight coupling between main processor and CGRA, by reconfiguring some processing elements of the CGRA as a VLIW processor.…”
Section: Related Workmentioning
confidence: 99%
“…For the CGRA coprocessor we assume that its input/output are provided on its local memory, which may be multi-banked to provide high bandwidth toward the processing elements in the coprocessor [17], [18]. Recent CGRA coprocessors [18], [19] can access any data on its local memory using addressed load/store operations, but the addresses must be linear (or at least easily computable using arithmetic operations only).…”
Section: System Architecturementioning
confidence: 99%
“…If all the code and data of that task that is mapped to the SPE fit in the local memory of the SPE, then very power-efficient execution is achieved. In fact, the peak power-efficiency of the IBM Cell processor is 5.1 Giga operations per second per watt [17]. Contrast this with the power-efficiency of traditional shared memory multi-cores, e.g., the Intel Core2 Quad is only 0.35 Giga operations per second per watt [17].…”
Section: Introductionmentioning
confidence: 98%
“…In fact, the peak power-efficiency of the IBM Cell processor is 5.1 Giga operations per second per watt [17]. Contrast this with the power-efficiency of traditional shared memory multi-cores, e.g., the Intel Core2 Quad is only 0.35 Giga operations per second per watt [17]. However, if the code and data of the application do not fit into the local memory, then the global memory must be leveraged to contain them through explicit DMA calls.…”
Section: Introductionmentioning
confidence: 99%