Scale up your In-Memory Accelerator: Leveraging Wireless-on-Chip Communication for AIMC-based CNN Inference

Bruschi, Nazareno; Tagliavini, Giuseppe; Conti, Francesco; Abadal, Sergi; Cabellos-Aparicio, Alberto; Alarcón, Eduard; Karunaratne, Geethan; Boybat, Irem; Benini, Luca; Rossi, Davide

doi:10.1109/aicas54282.2022.9869996

Cited by 2 publications

(2 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…b.) Parallelized convolution [14]: This is a CNN-based inference workload in which the layers of the network and inputs are tiled and deployed on separate cores. This is a pure L2 to L1 (core) and L1 (core) to L2 memory traffic pattern and has no intercore communication.…”

Section: B Synthetic Trafficmentioning

confidence: 99%

“…This is a pure L2 to L1 (core) and L1 (core) to L2 memory traffic pattern and has no intercore communication. c.) Pipelined convolution [14]: Depth-first or pipeline dataflow is used in many new DNN platforms to efficiently run CNN-based inference. In this scheme, layers are executed in parallel, in a pipelined way across the different cores to reduce the data traffic to higher memory levels [15].…”

Section: B Synthetic Trafficmentioning

confidence: 99%

See 1 more Smart Citation

PATRONoC: Parallel AXI Transport Reducing Overhead for Networks-on-Chip targeting Multi-Accelerator DNN Platforms at the Edge

Jain,

Cavalcante,

Bruschi

et al. 2023

2023 60th ACM/IEEE Design Automation Conference (DAC)

Self Cite

View full text Add to dashboard Cite

Emerging deep neural network (DNN) applications require high-performance multi-core hardware acceleration with large data bursts. Classical network-on-chips (NoCs) use serial packet-based protocols suffering from significant protocol translation overheads towards the endpoints. This paper proposes PATRONoC, an open-source fully AXI-compliant NoC fabric to better address the specific needs of multi-core DNN computing platforms. Evaluation of PATRONoC in a 2D-mesh topology shows 34 % higher area efficiency compared to a state-of-the-art classical NoC at 1 GHz. PATRONoC's throughput outperforms a baseline NoC by 2-8× on uniform random traffic and provides a high aggregated throughput of up to 350 GiB/s on synthetic and DNN workload traffic.

show abstract

Section: B Synthetic Trafficmentioning

confidence: 99%

Section: B Synthetic Trafficmentioning

confidence: 99%