2021
DOI: 10.1109/tc.2021.3107726
|View full text |Cite
|
Sign up to set email alerts
|

An Open-Source Platform for High-Performance Non-Coherent On-Chip Communication

Abstract: On-chip communication infrastructure is a central component of modern systems-on-chip (SoCs), and it continues to gain importance as the number of cores, the heterogeneity of components, and the on-chip and off-chip bandwidth continue to grow. Decades of research on on-chip networks enabled cache-coherent shared-memory multiprocessors. However, communication fabrics that meet the needs of heterogeneous many-cores and accelerator-rich SoCs, which are not, or only partially, coherent, are a much less mature rese… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 14 publications
(12 citation statements)
references
References 29 publications
0
6
0
Order By: Relevance
“…In particular, we examine three aggregated interconnect bandwidths between CLs and L2 as wired: 22.4, 44.8, and 89.6 Gbit/s at f clock = 350 MHz, which corresponds to an interconnect bandwidth of 64, 128, and 256 bit/cycle, respectively. In this way, we span a wide range of available wired interconnect resources that can be instantiated in this kind of system [17]. Moreover, we assume a very optimistic latency of 9 cycles between CL and L2.…”
Section: Simulation Methodologymentioning
confidence: 99%
“…In particular, we examine three aggregated interconnect bandwidths between CLs and L2 as wired: 22.4, 44.8, and 89.6 Gbit/s at f clock = 350 MHz, which corresponds to an interconnect bandwidth of 64, 128, and 256 bit/cycle, respectively. In this way, we span a wide range of available wired interconnect resources that can be instantiated in this kind of system [17]. Moreover, we assume a very optimistic latency of 9 cycles between CL and L2.…”
Section: Simulation Methodologymentioning
confidence: 99%
“…Therefore, the growing number of PEs connected to the AXI interconnect leads to an increase in memory access latency per each PE and increases the probability of memory congestion due to limited crossbar bandwidth. In order to reduce memory congestion per tile, we use an open-source high-performance coherent AXI interconnect implementation [31], [32]. The AXI interconnect is based on a fully-connected crossbar where each slave port has a dedicated connection to each master port.…”
Section: A Modular and Configurable Compute Tilesmentioning
confidence: 99%
“…Some works adopt cache coherence protocols, as the distributed directory-based protocol [2,12]. Other works [8,9,15], argue that cache coherence protocols are not scalable for many-cores due to their high cost in terms of synchronization overhead and energy consumption observed, specifically, for streaming data-flow applications [15]. Instead, the alternative is to rely upon software-managed scratchpad memory close to each CPU, with the communication among CPUs initialized by software [1,7,8,14,15].…”
Section: Overview Of Many-core Platforms and Debuggingmentioning
confidence: 99%