Darkside: 2.6GFLOPS, 8.7mW Heterogeneous RISC-V Cluster for Extreme-Edge On-Chip DNN Inference and Training

Garofalo, Angelo; Perotti, Matteo; Valente, Luca; Tortorella, Yvan; Nadalini, Alessandro; Benini, Luca; Rossi, Davide; Conti, Francesco

doi:10.1109/esscirc55480.2022.9911384

Cited by 3 publications

(1 citation statement)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The additional cores and efficient architecture are noticeable in the performance. MinPool's matmul performance outper- Thestral [7] Vega [8] Mr. Wolf [9] Darkside [10] forms all chips by a factor of 1.1-6×. It even outperforms Thestral [7], which is implemented in a newer technology and runs at 2.5× the frequency.…”

Section: Related Workmentioning

confidence: 99%

MinPool: A 16-core NUMA-L1 Memory RISC-V Processor Cluster for Always-on Image Processing in 65nm CMOS

Riedel,

Cavalcante,

Frouzakis

et al. 2023

2023 30th IEEE International Conference on Electronics, Circuits and Systems (ICECS)

Self Cite

View full text Add to dashboard Cite

Always-on image processing is crucial for many applications, such as face and attention detection, and it is usually offloaded to dedicated, energy-efficient image processors. These processors need to be flexible and scalable to follow the rapid evolution of image sensors and always-on image processing workloads. A flexible architecture is the shared memory cluster, where multiple cores are tightly coupled with L1 memory. However, current clusters are not latency tolerant and follow a uniform memory access approach, which limits their frequency and scalability. The MemPool architecture [1] lifts those constraints by combining latency-tolerant cores, pipelined functional processing units, and a non-uniform memory access interconnect. This paper presents MinPool, a low-power image processor for always-on functions implemented in TSMC's 65 nm technology and based on a tailored MemPool architecture. Thanks to an instruction set architecture extension tuned for image processing and the low-leakage process, it achieves excellent utilization results with IPCs of up to 0.98 and an energy efficiency of 65 GOPS/W for key image processing kernels.

show abstract