Patrick Cooke scite author profile

With the emergence of accelerator devices such as multicores, graphics-processing units (GPUs), and field-programmable gate arrays (FPGAs), application designers are confronted with the problem of searching a huge design space that has been shown to have widely varying performance and energy metrics for different accelerators, different application domains, and different use cases. To address this problem, numerous studies have evaluated specific applications across different accelerators. In this paper, we analyze an important domain of applications, referred to as sliding-window applications, when executing on FPGAs, GPUs, and multicores. For each device, we present optimization strategies and analyze use cases where each device is most effective. The results show that FPGAs can achieve speedup of up to 11x and 57x compared to GPUs and multicores, respectively, while also using orders of magnitude less energy.

show abstract

A Tradeoff Analysis of FPGAs, GPUs, and Multicores for Sliding-Window Applications

Cooke

Fowers

Brown

et al. 2015

ACM Trans. Reconfigurable Technol. Syst.

View full text Add to dashboard Cite

The increasing usage of hardware accelerators such as Field-Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) has significantly increased application design complexity. Such complexity results from a larger design space created by numerous combinations of accelerators, algorithms, and hw/sw partitions. Exploration of this increased design space is critical due to widely varying performance and energy consumption for each accelerator when used for different application domains and different use cases. To address this problem, numerous studies have evaluated specific applications across different architectures.In this article, we analyze an important domain of applications, referred to as sliding-window applications, implemented on FPGAs, GPUs, and multicore CPUs. For each device, we present optimization strategies and analyze use cases where each device is most effective. The results show that, for large input sizes, FPGAs can achieve speedups of up to 5.6× and 58× compared to GPUs and multicore CPUs, respectively, while also using up to an order of magnitude less energy. For small input sizes and applications with frequency-domain algorithms, GPUs generally provide the best performance and energy. . 2015. A tradeoff analysis of FPGAs, GPUs, and multicores for sliding-window applications.

show abstract

Finite-State-Machine Overlay Architectures for Fast FPGA Compilation and Application Portability

Cooke

Hao

Stitt

2015

ACM Trans. Embed. Comput. Syst.

View full text Add to dashboard Cite

Despite significant advantages, wider usage of field-programmable gate arrays (FPGAs) has been limited by lengthy compilation and a lack of portability. Virtual-architecture overlays have partially addressed these problems, but previous work focuses mainly on heavily pipelined applications with minimal control requirements. We expand previous work by enabling more flexible control via overlay architectures for finitestate machines. Although not appropriate for control-intensive circuits, the presented architectures reduced compilation times of control changes in a convolution case study from 7 hours to less than 1 second, with no performance overhead and an area overhead of 0.2%. . 2015. Finite-state-machine overlay architectures for fast FPGA compilation and application portability.

show abstract

A Parallel Sliding-Window Generator for High-Performance Digital-Signal Processing on FPGAs

Stitt

Schwartz

Cooke

2016

ACM Trans. Reconfigurable Technol. Syst.

View full text Add to dashboard Cite

Sliding-window applications, an important class of the digital-signal processing domain, are highly amenable to pipeline parallelism on field-programmable gate arrays (FPGAs). Although memory bandwidth often restricts parallelism for many applications, sliding-window applications can leverage custom buffers, referred to as sliding-window generators, that provide massive input bandwidth that far exceeds the capabilities of external memory. Previous work has introduced a variety of sliding-window generators, but those approaches typically generate at most one window per cycle, which significantly restricts parallelism. In this article, we address this limitation with a parallel sliding-window generator that can generate a configurable number of windows every cycle. Although in practice the number of parallel windows is limited by memory bandwidth, we show that even with common bandwidth limitations, the presented generator enables near-linear speedups up to 16x faster than previous FPGA studies that generate a single window per cycle, which were already in some cases faster than graphics-processing units and microprocessors. . 2016. A parallel sliding-window generator for high-performance digital-signal processing on FPGAs. ACM Trans.

show abstract

A comparison of correntropy-based feature tracking on FPGAs and GPUs

Cooke

Fowers

Stitt

et al. 2013

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Patrick Cooke

A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications

A Tradeoff Analysis of FPGAs, GPUs, and Multicores for Sliding-Window Applications

Finite-State-Machine Overlay Architectures for Fast FPGA Compilation and Application Portability

A Parallel Sliding-Window Generator for High-Performance Digital-Signal Processing on FPGAs

A comparison of correntropy-based feature tracking on FPGAs and GPUs

Contact Info

Product

Resources

About