Dillon Huff scite author profile

Dillon Huff

5Publications

43Citation Statements Received

40Citation Statements Given

How they've been cited

How they cite others

133

Affiliations

Stanford University

Publications

Order By: Most citations

Type-directed scheduling of streaming accelerators

Durst

Feldman

Huff

et al. 2020

View full text Add to dashboard Cite

Designing efficient, application-specialized hardware accelerators requires assessing trade-offs between a hardware module's performance and resource requirements. To facilitate hardware design space exploration, we describe Aetherling, a system for automatically compiling data-parallel programs into statically scheduled, streaming hardware circuits. Aetherling contributes a space-and time-aware intermediate language featuring data-parallel operators that represent parallel or sequential hardware modules, and sequence data types that encode a module's throughput by specifying when sequence elements are produced or consumed. As a result, well-typed operator composition in the space-time language corresponds to connecting hardware modules via statically scheduled, streaming interfaces.We provide rules for transforming programs written in a standard data-parallel language (that carries no information about hardware implementation) into equivalent spacetime language programs. We then provide a scheduling algorithm that searches over the space of transformations to quickly generate area-efficient hardware designs that achieve a programmer-specified throughput. Using benchmarks from the image processing domain, we demonstrate that Aetherling enables rapid exploration of hardware designs with different throughput and area characteristics, and yields results that require 1.8-7.9× fewer FPGA slices than those of prior hardware generation systems.

show abstract

CoSA: Integrated Verification for Agile Hardware Design

Mattarei

Mann

Barrett

et al. 2018

View full text Add to dashboard Cite

Clockwork: Resource-Efficient Static Scheduling for Multi-Rate Image Processing Applications on FPGAs

Huff

Dai

Hanrahan

2021

View full text Add to dashboard Cite

Creating an Agile Hardware Design Flow

Bahr

Barrett

Bhagdikar

et al. 2020

View full text Add to dashboard Cite

Unified Buffer: Compiling Image Processing and Machine Learning Applications to Push-Memory Accelerators

Liu

Setter

Huff

et al. 2023

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

Image processing and machine learning applications benefit tremendously from hardware acceleration. Existing compilers target either FPGAs, which sacrifice power and performance for programmability, or ASICs, which become obsolete as applications change. Programmable domain-specific accelerators, such as coarse-grained reconfigurable arrays (CGRAs), have emerged as a promising middle-ground, but they have traditionally been difficult compiler targets since they use a different memory abstraction. In contrast to CPUs and GPUs, the memory hierarchies of domain-specific accelerators use push memories : memories that send input data streams to computation kernels or to higher or lower levels in the memory hierarchy, and store the resulting output data streams. To address the compilation challenge caused by push memories, we propose that the representation of these memories in the compiler be altered to directly represent them by combining storage with address generation and control logic in a single structure—a unified buffer. The unified buffer abstraction enables the compiler to separate generic push memory optimizations from the mapping to specific memory implementations in the backend. This separation allows our compiler to map high-level Halide applications to different CGRA memory designs, including some with a ready-valid interface. The separation also opens the opportunity for optimizing push memory elements on reconfigurable arrays. Our optimized memory implementation, the Physical Unified Buffer (PUB), uses a wide-fetch, single-port SRAM macro with built-in address generation logic to implement a buffer with two read and two write ports. It is 18% smaller and consumes 31% less energy than a physical buffer implementation using a dual-port memory that only supports two ports. Finally, our system evaluation shows that enabling a compiler to support CGRAs leads to performance and energy benefits. Over a wide range of image processing and machine learning applications, our CGRA achieves 4.7 × better runtime and 3.5 × better energy-efficiency compared to an FPGA.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Dillon Huff

Type-directed scheduling of streaming accelerators

CoSA: Integrated Verification for Agile Hardware Design

Clockwork: Resource-Efficient Static Scheduling for Multi-Rate Image Processing Applications on FPGAs

Creating an Agile Hardware Design Flow

Unified Buffer: Compiling Image Processing and Machine Learning Applications to Push-Memory Accelerators

Contact Info

Product

Resources

About