Gerald Hempel scite author profile

Using soft-core processors on FPGAs offers the opportunity to customize the system design in order to accelerate the application. While this has always been possible manually by hardware designers, it requires distinct knowledge of design methods and of the microarchitecture of the soft-core. In this paper we show that a mature compiler like the GCC can be used for automatic generation of processor customizations directly from the C code of the application. To this end, we have extended the GCC to automatically select candidate sequences of the whole application and transform them into hardware extensions.

show abstract

A Comparison of Hardware Acceleration Interfaces in a Customizable Soft Core Processor

Hempel

Hochberger

Koch

2010

View full text Add to dashboard Cite

Abstract-Due to the continuously decreasing cost of FPGAs, they have become a valid implementation platform for SOCs. Typically, a soft core processor implementation is used to execute the software parts of the SOC. As each system is individually designed for a particular application, the idea is natural to support compute intensive parts of the code through customized hardware acceleration. Two different architectural variants have been proposed for this purpose in SOCs: either as an instruction set extension with specialized pipeline implementation or as a peripheral component that is programmed through memory mapping. In this contribution we analyze the efficiency (speedup related to LUTs) of those two variants.

show abstract

From Domain-Specific Languages to Memory-Optimized Accelerators for Fluid Dynamics

Friebel

Soldavini

Hempel

et al. 2021

View full text Add to dashboard Cite

Identifying homogenous reconfigurable regions in heterogeneous FPGAs for module relocation

Backasch

Hempel

Werner

et al. 2014

View full text Add to dashboard Cite

Robust Mapping of Process Networks to Many-Core Systems using Bio-Inspired Design Centering

Hempel

Goens

Castrillón

et al. 2017

View full text Add to dashboard Cite

Embedded systems are o en designed as complex architectures with numerous processing elements. E ectively programming such systems requires parallel programming models, e.g. task-based or data ow-based models. With these types of models, the mapping of the abstract application model to the existing hardware architecture plays a decisive role and is usually optimized to achieve an ideal resource footprint or a near-minimal execution time. However, when mapping several independent programs to the same platform, resource con icts can arise. is can be circumvented by remapping some of the tasks of an application, which in turn a ect its timing behavior, possibly leading to constraint violations. In this work we present a novel method to compute mappings that are robust against local task remapping. e underlying method is based on the bio-inspired design centering algorithm of L p-Adaptation. We evaluate this with several benchmarks on di erent platforms and show that mappings obtained with our algorithm are indeed robust. In all experiments, our robust mappings tolerated signi cantly more run-time perturbations without violating constraints than mappings devised with optimization heuristics.

show abstract

Automatic Creation of High-bandwidth Memory Architectures from Domain-specific Languages: The Case of Computational Fluid Dynamics

Soldavini

Friebel

Tibaldi

et al. 2023

ACM Trans. Reconfigurable Technol. Syst.

View full text Add to dashboard Cite

Numerical simulations can help solve complex problems. Most of these algorithms are massively parallel and thus good candidates for FPGA acceleration thanks to spatial parallelism. Modern FPGA devices can leverage high-bandwidth memory technologies, but when applications are memory-bound designers must craft advanced communication and memory architectures for efficient data movement and on-chip storage. This development process requires hardware design skills that are uncommon in domain-specific experts. In this paper, we propose an automated tool flow from a domain-specific language (DSL) for tensor expressions to generate massively-parallel accelerators on HBM-equipped FPGAs. Designers can use this flow to integrate and evaluate various compiler or hardware optimizations. We use computational fluid dynamics (CFD) as a paradigmatic example. Our flow starts from the high-level specification of tensor operations and combines an MLIR-based compiler with an in-house hardware generation flow to generate systems with parallel accelerators and a specialized memory architecture that moves data efficiently, aiming at fully exploiting the available CPU-FPGA bandwidth. We simulated applications with millions of elements, achieving up to 103 GFLOPS with one compute unit and custom precision when targeting a Xilinx Alveo U280. Our FPGA implementation is up to 25 × more energy efficient than expert-crafted Intel CPU implementations.

show abstract

A resource optimized SoC Kit for FPGAs

Hempel¹,

Hochberger²

2007

View full text Add to dashboard Cite

Modern FPGAs have become so affordable that they can be used to substitute ASICs in mass produced devices. A key component of such configurable system on a chip (CSoC) is the processor core. Available and usable cores are either 32 or 8 bit wide. Thus, there is a gap between these two extremes, which we want to fill with our SoC kit. In this contribution we elaborate on our SoC kit and its components and compare it to other SoC design environments.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Gerald Hempel

A resource optimized Processor Core for FPGA based SoCs

Towards GCC-based automatic soft-core customization

A Comparison of Hardware Acceleration Interfaces in a Customizable Soft Core Processor

From Domain-Specific Languages to Memory-Optimized Accelerators for Fluid Dynamics

Identifying homogenous reconfigurable regions in heterogeneous FPGAs for module relocation

Robust Mapping of Process Networks to Many-Core Systems using Bio-Inspired Design Centering

Automatic Creation of High-bandwidth Memory Architectures from Domain-specific Languages: The Case of Computational Fluid Dynamics

A resource optimized SoC Kit for FPGAs

Contact Info

Product

Resources

About