Yvan Tortorella scite author profile

Yvan Tortorella

4Publications

5Citation Statements Received

61Citation Statements Given

How they've been cited

How they cite others

Affiliations

University of Bologna

Publications

Order By: Most citations

RedMulE: A Compact FP16 Matrix-Multiplication Accelerator for Adaptive Deep Learning on RISC-V-Based Ultra-Low-Power SoCs

Tortorella

Bertaccini

Rossi

et al. 2022

View full text Add to dashboard Cite

The fast proliferation of extreme-edge applications using Deep Learning (DL) based algorithms required dedicated hardware to satisfy extreme-edge applications' latency, throughput, and precision requirements. While inference is achievable in practical cases, online finetuning and adaptation of general DL models are still highly challenging. One of the key stumbling stones is the need for parallel floating-point operations, which are considered unaffordable on sub-100 mW extreme-edge SoCs. We tackle this problem with RedMulE (Reduced-precision matrix Multiplication Engine), a parametric low-power hardware accelerator for FP16 matrix multiplications -the main kernel of DL training and inference -conceived for tight integration within a cluster of tiny RISC-V cores based on the PULP (Parallel Ultra-Low-Power) architecture. In 22 nm technology, a 32-FMA RedMulE instance occupies just 0.07 mm 2 (14% of an 8-core RISC-V cluster) and achieves up to 666 MHz maximum operating frequency, for a throughput of 31.6 MAC/cycle (98.8% utilization). We reach a cluster-level power consumption of 43.5 mW and a full-cluster energy efficiency of 688 16-bit GFLOPS/W. Overall, RedMulE features up to 4.65× higher energy efficiency and 22× speedup over SW execution on 8 RISC-V cores.

show abstract

HULK-V: a Heterogeneous Ultra-low-power Linux capable RISC-V SoC

Valente

Tortorella

Sinigaglia

et al. 2023

View full text Add to dashboard Cite

IoT applications span a wide range in performance and memory footprint, under tight cost and power constraints. High-end applications rely on power-hungry Systems-on-Chip (SoCs) featuring powerful processors, large LPDDR/DDR3/4/5 memories, and supporting full-fledged Operating Systems (OS). On the contrary, low-end applications typically rely on Ultra-Low-Power µcontrollers with a "close to metal" software environment and simple micro-kernel-based runtimes. Emerging applications and trends of IoT require the "best of both worlds": cheap and low-power SoC systems with a well-known and agile software environment based on full-fledged OS (e.g., Linux), coupled with extreme energy efficiency and parallel digital signal processing capabilities. We present HULK-V: an open-source Heterogeneous Linux-capable RISC-V-based SoC coupling a 64bit RISC-V processor with an 8-core Programmable Multi-Core Accelerator (PMCA), delivering up to 13.8 GOps, up to 157 GOps/W and accelerating the execution of complex DSP and ML tasks by up to 112× over the host processor. HULK-V leverages a lightweight, fully digital memory hierarchy based on HyperRAM IoT DRAM that exposes up to 512 MB of DRAM memory to the host CPU. Featuring HyperRAMs, HULK-V doubles the energy efficiency without significant performance loss compared to featuring power-hungry LPDDR memories, requiring expensive and large mixed-signal PHYs. HULK-V, implemented in Global Foundries 22nm FDX technology, is a fully digital ultra-low-cost SoC running a 64-bit Linux software stack with OpenMP hostto-PMCA offload within a power envelope of just 250 mW.

show abstract

Darkside: 2.6GFLOPS, 8.7mW Heterogeneous RISC-V Cluster for Extreme-Edge On-Chip DNN Inference and Training

Garofalo

Perotti

Valente

et al. 2022

View full text Add to dashboard Cite

DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training

Garofalo

Tortorella

Perotti³

et al. 2022

IEEE Open J. Solid-State Circuits Soc.

View full text Add to dashboard Cite

On-chip DNN inference and training at the Extreme-Edge (TinyML) impose strict latency, throughput, accuracy and flexibility requirements. Heterogeneous clusters are promising solutions to meet the challenge, combining the flexibility of DSP-enhanced cores with the performance and energy boost of dedicated accelerators. We present DARKSIDE, a Systemon-Chip with a heterogeneous cluster of 8 RISC-V cores enhanced with 2-b to 32-b mixed-precision integer arithmetic. To boost performance and efficiency on key compute-intensive Deep Neural Network (DNN) kernels, the cluster is enriched with three digital accelerators: a specialized engine for low-data-reuse depthwise convolution kernels (up to 30 MAC/cycle); a minimal overhead datamover to marshal 1-b to 32-b data on-the-fly; a 16b floating point Tensor Product Engine (TPE) for tiled matrixmultiplication acceleration. DARKSIDE is implemented in 65nm CMOS technology. The cluster achieves a peak integer performance of 65 GOPS and a peak efficiency of 835 GOPS/W when working on 2-b integer DNN kernels. When targeting floatingpoint tensor operations, the TPE provides up to 18.2 GFLOPS of performance or 300 GFLOPS/W of efficiency -enough to enable on-chip floating-point training at competitive speed coupled with ultra-low power quantized inference.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yvan Tortorella

RedMulE: A Compact FP16 Matrix-Multiplication Accelerator for Adaptive Deep Learning on RISC-V-Based Ultra-Low-Power SoCs

HULK-V: a Heterogeneous Ultra-low-power Linux capable RISC-V SoC

Darkside: 2.6GFLOPS, 8.7mW Heterogeneous RISC-V Cluster for Extreme-Edge On-Chip DNN Inference and Training

DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training

Contact Info

Product

Resources

About