Paolo Mantovani scite author profile

Emerging technologies provide SoCs with fine-grained DVFS capabilities both in space (number of domains) and time (transients in the order of tens of nanoseconds). Analyzing these systems requires cycle-accurate accounting of rapidly-changing dynamics and complex interactions among accelerators, interconnect, memory, and OS. We present an FPGA-based infrastructure that facilitates such analyses for high-performance embedded systems. We show how our infrastructure can be used to first generate SoCs with looselycoupled accelerators, and then perform design-space exploration considering several DVFS policies under full-system workload scenarios, sweeping spatial and temporal domain granularity.

show abstract

System-Level Optimization of Accelerator Local Memory for Heterogeneous Systems-on-Chip

Pilato

Mantovani

Guglielmo

et al. 2016

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

In modern system-on-chip architectures, specialized accelerators are increasingly used to improve performance and energy efficiency. The growing complexity of these systems requires the use of system-level design methodologies featuring high-level synthesis (HLS) for generating these components efficiently. Existing HLS tools, however, have limited support for the system-level optimization of memory elements, which typically occupy most of the accelerator area. We present a complete methodology for designing the private local memories (PLMs) of multiple accelerators. Based on the memory requirements of each accelerator, our methodology automatically determines an areaefficient architecture for the PLMs to guarantee performance and reduce the memory cost based on technology-related information. We implemented a prototype tool, called MNEMOSYNE, that embodies our methodology within a commercial HLS flow. We designed 13 complex accelerators for selected applications from two recently-released benchmark suites (PERFECT and CORTEXSUITE). With our approach we are able to reduce the memory cost of single accelerators by up to 45%. Moreover, when reusing memory IPs across accelerators, we achieve area savings that range between 17% and 55% compared to the case where the PLMs are designed separately. Index Terms-Hardware accelerator, high-level synthesis (HLS), memory design, multibank architecture. I. INTRODUCTION S YSTEM-ON-CHIP (SoC) architectures increasingly feature hardware accelerators to achieve energy-efficient high performance [1]. Complex applications leverage these specialized components to improve the execution of selected computational kernels [2], [3]. For example, hardware accelerators for machine learning applications are increasingly used to identify underlying relations in massive unstructured data [4]-[6]. Many of these algorithms first build an internal model by analyzing very large data sets; then, they leverage Manuscript

show abstract

Agile SoC development with open ESP

Mantovani

Giri

Guglielmo

et al. 2020

View full text Add to dashboard Cite

ESP is an open-source research platform for heterogeneous SoC design. The platform combines a modular tile-based architecture with a variety of application-oriented flows for the design and optimization of accelerators. The ESP architecture is highly scalable and strikes a balance between regularity and specialization. The companion methodology raises the level of abstraction to system-level design and enables an automated flow from software and hardware development to full-system prototyping on FPGA. For application developers, ESP offers domain-specific automated solutions to synthesize new accelerators for their software and to map complex workloads onto the SoC architecture. For hardware engineers, ESP offers automated solutions to integrate their accelerator designs into the complete SoC. Conceived as a heterogeneous integration platform and tested through years of teaching at Columbia University, ESP supports the open-source hardware community by providing a flexible platform for agile SoC development. CCS CONCEPTS• Computer systems organization → System on a chip; • Hardware → Methodologies for EDA.

show abstract

An Analysis of Accelerator Coupling in Heterogeneous Architectures

Cota

Mantovani

Guglielmo

et al. 2015

View full text Add to dashboard Cite

A Switched-Inductor Integrated Voltage Regulator With Nonlinear Feedback and Network-on-Chip Load in 45 nm SOI

Sturcken

Petracca

Warren

et al. 2012

IEEE J. Solid-State Circuits

View full text Add to dashboard Cite

Handling large data sets for high-performance embedded applications in heterogeneous systems-on-chip

Mantovani

Cota

Pilato

et al. 2016

View full text Add to dashboard Cite

Local memory is a key factor for the performance of accelerators in SoCs. Despite technology scaling, the gap between on-chip storage and memory footprint of embedded applications keeps widening. We present a solution to preserve the speedup of accelerators when scaling from small to large data sets. Combining specialized DMA and address translation with a software layer in Linux, our design is transparent to user applications and broadly applicable to any class of SoCs hosting high-throughput accelerators. We demonstrate the robustness of our design across many heterogeneous workload scenarios and memory allocation policies with FPGA-based SoC prototypes featuring twelve concurrent accelerators accessing up to 768MB out of 1GB-addressable DRAM.

show abstract

ESP4ML: Platform-Based Design of Systems-on-Chip for Embedded Machine Learning

Giri

Chiu

Guglielmo

et al. 2020

View full text Add to dashboard Cite

System-level memory optimization for high-level synthesis of component-based SoCs

Pilato

Mantovani

Guglielmo

et al. 2014

View full text Add to dashboard Cite

The design of specialized accelerators is essential to the success of many modern Systems-on-Chip. Electronic systemlevel design methodologies and high-level synthesis tools are critical for the efficient design and optimization of an accelerator. Still, these methodologies and tools offer only limited support for the optimization of the memory structures, which are often responsible for most of the area occupied by an accelerator. To address these limitations, we present a novel methodology to automatically derive the memory subsystems of SoC accelerators. Our approach enables compositional design-space exploration and promotes design reuse of the accelerator specifications. We illustrate its effectiveness by presenting experimental results on the design of two accelerators for a high-performance embedded application.

show abstract

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Paolo Mantovani

An FPGA-based infrastructure for fine-grained DVFS analysis in high-performance embedded systems

System-Level Optimization of Accelerator Local Memory for Heterogeneous Systems-on-Chip

Agile SoC development with open ESP

An Analysis of Accelerator Coupling in Heterogeneous Architectures

A Switched-Inductor Integrated Voltage Regulator With Nonlinear Feedback and Network-on-Chip Load in 45 nm SOI

Handling large data sets for high-performance embedded applications in heterogeneous systems-on-chip

ESP4ML: Platform-Based Design of Systems-on-Chip for Embedded Machine Learning

System-level memory optimization for high-level synthesis of component-based SoCs

Contact Info

Product

Resources

About