An Open-Source Platform for High-Performance Non-Coherent On-Chip Communication

Kurth, Andreas; Rönninger, Wolfgang; Benz, Thomas; Cavalcante, Matheus; Schuiki, Fabian; Zaruba, Florian; Benini, Luca

doi:10.1109/tc.2021.3107726

Cited by 14 publications

(12 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In particular, we examine three aggregated interconnect bandwidths between CLs and L2 as wired: 22.4, 44.8, and 89.6 Gbit/s at f clock = 350 MHz, which corresponds to an interconnect bandwidth of 64, 128, and 256 bit/cycle, respectively. In this way, we span a wide range of available wired interconnect resources that can be instantiated in this kind of system [17]. Moreover, we assume a very optimistic latency of 9 cycles between CL and L2.…”

Section: Simulation Methodologymentioning

confidence: 99%

Scale up your In-Memory Accelerator: Leveraging Wireless-on-Chip Communication for AIMC-based CNN Inference

Bruschi

Tagliavini

Conti

et al. 2022

2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS)

Self Cite

View full text Add to dashboard Cite

In-Memory Computing (AIMC) is emerging as a disruptive paradigm for heterogeneous computing, potentially delivering orders of magnitude better peak performance and efficiency over traditional digital signal processing architectures on Matrix-Vector multiplication. However, to sustain this throughput in real-world applications, AIMC tiles must be supplied with data at very high bandwidth and low latency; this poses an unprecedented pressure on the on-chip communication infrastructure, which becomes the system's performance and efficiency bottleneck. In this context, the performance and plasticity of emerging on-chip wireless communication paradigms provide the required breakthrough to up-scale on-chip communication in large AIMC devices. This work presents a many-tile AIMC architecture with inter-tile wireless communication that integrates multiple heterogeneous computing clusters, embedding a mix of parallel RISC-V cores and AIMC tiles. We perform an extensive design space exploration of the proposed architecture and discuss the benefits of exploiting emerging on-chip communication technologies such as wireless transceivers in the millimeter-wave and terahertz bands.

show abstract

Section: Simulation Methodologymentioning

confidence: 99%

Scale up your In-Memory Accelerator: Leveraging Wireless-on-Chip Communication for AIMC-based CNN Inference

Bruschi

Tagliavini

Conti

et al. 2022

2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Therefore, the growing number of PEs connected to the AXI interconnect leads to an increase in memory access latency per each PE and increases the probability of memory congestion due to limited crossbar bandwidth. In order to reduce memory congestion per tile, we use an open-source high-performance coherent AXI interconnect implementation [31], [32]. The AXI interconnect is based on a fully-connected crossbar where each slave port has a dedicated connection to each master port.…”

Section: A Modular and Configurable Compute Tilesmentioning

confidence: 99%

AGILER: An Adaptive Heterogeneous Tile-Based Many-Core Architecture for RISC-V Processors

Kamaleldin¹,

Göhringer²

2022

IEEE Access

View full text Add to dashboard Cite

Tile-based many-core architectures are extensively used in modern system-on-chip designs to achieve scalable computing performance with adequate energy efficiency. Heterogeneity is the key element to boost computing performance and keep energy consumption under certain limits for several application domains. However, the steady increase of using many custom heterogeneous tiles leads to an expansion in design and integration cost with limited tiles re-usability. The recent widespread of opensource RISC-V ISA provides the potential to develop modular compute units that can be used for many application domains with high reduction in non-recurring engineering costs. The motivation of this work is to bring design modularity and adaptability features for heterogeneous tile-based many-core architectures by increasing their flexibility to realize different many-core configurations with less design time and costs. In this work, AGILER is proposed as an adaptive tile-base many-core architecture for heterogeneous RISC-V based processors. The proposed architecture consists of modular and adaptable heterogeneous multi-/single-core compute tiles that supports 32-/64-bit RISC-V ISAs with different memory hierarchies. Intertile communication is developed based on a scalable network-on-chip architecture to achieve a high degree of system scalability. AGILER supports run-time adaptation through a custom internal reconfiguration manager for dynamic and partial reconfiguration over Xilinx FPGAs. Evaluation results demonstrate that the proposed architecture features a scalable computing performance up to 685 MOPS for 8x32-bit tiles and 316 MOPS for 8x64-bit tiles with a scalable memory bandwidth up to 7.4 GB/s. AGILER is evaluated on Xilinx Virtex Ultrascale+ FPGA with a maximum reconfiguration time of 38.1 ms for a single compute tile.INDEX TERMS Many-core architecture, parallel computing, RISC-V, network-on-chip (NoC), field programmable gate array (FPGA), reconfigurable computing.

show abstract

“…Some works adopt cache coherence protocols, as the distributed directory-based protocol [2,12]. Other works [8,9,15], argue that cache coherence protocols are not scalable for many-cores due to their high cost in terms of synchronization overhead and energy consumption observed, specifically, for streaming data-flow applications [15]. Instead, the alternative is to rely upon software-managed scratchpad memory close to each CPU, with the communication among CPUs initialized by software [1,7,8,14,15].…”

Section: Overview Of Many-core Platforms and Debuggingmentioning

confidence: 99%

ManyGUI: A Graphical Tool to Accelerate Many-core Debugging Through Communication, Memory, and Energy Profiling

Ruaro

Martin

2022

System Engineering for Constrained Embedded Systems

View full text Add to dashboard Cite

The debugging and validation of the many-core design is a complex task due to numerous events happening in the system simultaneously. Current state-of-the-art many-cores are strongly based on waveforms and log files to validate their behavior during simulation. Our hypothesis is that, as happened with ASIC development, a Graphical User Interface (GUI) can significantly accelerate the many-core development. To sustain that, we propose an open-source GUI tool called ManyGUI for many-core debugging. ManyGUI is organized in a framework that collects and classifies high-level events during simulation related to computation (executed CPU instructions), memory, and communication (NoC packets). Such events are shown graphically to the developer through a set of intuitive and practical frames. We evaluate ManyGUI in a silicon-proven state-of-the-art open-source manycore called OpenPiton, which uses RISC-V 64-bits CPU, 3 NoCs, and a distributed/shared cache memory organization. Results show that ManyGUI allows the developer to rapidly obtain a comprehensive view of the many-core behavior in terms of communication statistics (packets paths, link utilization), memory statistics (memory access, miss rate), and energy (CPU, memory, and NoC). CCS CONCEPTS• Hardware → Simulation and emulation; • Computer systems organization → Multicore architectures.

show abstract

An Open-Source Platform for High-Performance Non-Coherent On-Chip Communication

Cited by 14 publications

References 29 publications

Scale up your In-Memory Accelerator: Leveraging Wireless-on-Chip Communication for AIMC-based CNN Inference

Scale up your In-Memory Accelerator: Leveraging Wireless-on-Chip Communication for AIMC-based CNN Inference

AGILER: An Adaptive Heterogeneous Tile-Based Many-Core Architecture for RISC-V Processors

ManyGUI: A Graphical Tool to Accelerate Many-core Debugging Through Communication, Memory, and Energy Profiling

Contact Info

Product

Resources

About