2012 International Symposium on System on Chip (SoC) 2012
DOI: 10.1109/issoc.2012.6376362
|View full text |Cite
|
Sign up to set email alerts
|

A multi-banked shared-l1 cache architecture for tightly coupled processor clusters

Abstract: A shared-L1 cache architecture is proposed for tightly coupled processor clusters. Sharing an L1 tightly coupled data memory (TCDM) among a significant (up to 16) number of processors is challenging in terms of speed. Sharing L1 cache is even more challenging, since operation is more complex, as it eases programming. The feasibility in terms of performance of shared TCDM was shown in ST Microelectronics platform 2012, but the performance cost of supporting shared L1 cache remains to be proven.In this paper we … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 7 publications
0
3
0
Order By: Relevance
“…On and Hussin [19] analysed the impact that different many‐core clustering methods have on multiprocessing architectures. To improve performance, Kakoee et al [20] proposed a shared‐L1 cache architecture for tightly coupled processor clusters. These works demonstrate that memory access latencies differ strongly in such architectures, depending on the data locality on the clusters.…”
Section: Literature Reviewmentioning
confidence: 99%
“…On and Hussin [19] analysed the impact that different many‐core clustering methods have on multiprocessing architectures. To improve performance, Kakoee et al [20] proposed a shared‐L1 cache architecture for tightly coupled processor clusters. These works demonstrate that memory access latencies differ strongly in such architectures, depending on the data locality on the clusters.…”
Section: Literature Reviewmentioning
confidence: 99%
“…The work of Rahimi et al is extended in [19] by a controllable pipeline stage between the CPUs and memory banks to be more reliable and variation-tolerant. In [10] a shared L1 data cache is presented. Using the logarithmic interconnect network proposed by Rahimi et al, the best-case read latency is one clock cycle.…”
Section: I R E L At E D W O R Kmentioning
confidence: 99%
“…Streaming applications are characterized by continuous processing of a data stream via many different tasks. Due to the static data-flow between these tasks, the CoreVA-MPSoC uses software-managed scratchpad memories instead of caches, as they are used in [7], [10], [13], [16], and [17]. In contrast to the Epiphany [8], our CoreVAMPSoC features a hierarchical communication infrastructure.…”
Section: I R E L At E D W O R Kmentioning
confidence: 99%