2013
DOI: 10.1145/2400682.2400684
|View full text |Cite
|
Sign up to set email alerts
|

A performance and energy comparison of convolution on GPUs, FPGAs, and multicore processors

Abstract: Recent architectural trends have focused on increased parallelism via multicore processors and increased heterogeneity via accelerator devices (e.g., graphics-processing units, field-programmable gate arrays). Although these architectures have significant performance and energy potential, application designers face many device-specific challenges when choosing an appropriate accelerator or when customizing an algorithm for an accelerator. To help address this problem, in this article we thoroughly evaluate con… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
25
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 30 publications
(26 citation statements)
references
References 24 publications
(28 reference statements)
0
25
0
Order By: Relevance
“…Numerous earlier studies have evaluated FPGA performance for sliding-window applications, showing a variety of performance and energy advantages and trade-offs compared to microprocessors [Guo et al 2004] and GPUs [Fowers et al 2013;Baker et al 2007;Cope et al 2005;Asano et al 2009Pauwels et al 2012. The presented approach is complementary to these previous studies, potentially enabling significantly better FPGA performance via parallel windows.…”
Section: Previous Workmentioning
confidence: 95%
See 1 more Smart Citation
“…Numerous earlier studies have evaluated FPGA performance for sliding-window applications, showing a variety of performance and energy advantages and trade-offs compared to microprocessors [Guo et al 2004] and GPUs [Fowers et al 2013;Baker et al 2007;Cope et al 2005;Asano et al 2009Pauwels et al 2012. The presented approach is complementary to these previous studies, potentially enabling significantly better FPGA performance via parallel windows.…”
Section: Previous Workmentioning
confidence: 95%
“…For 2D convolution, these values will commonly be small (e.g., 3×3 and 5×5). A 1D convolution uses a single row, but the maximum window columns would be as large as the kernel, which commonly requires thousands of elements [Fowers et al 2013]. For image-comparison applications (e.g., template matching), the maximum window dimensions would be as large as the maximum dimensions of all tested images.…”
Section: Configuration Optionsmentioning
confidence: 99%
“…A thorough investigation of 1D convolution across different platforms is done in [Fowers et al 2013]. The 1D convolution is implemented in both time-domain using the overlap-save algorithm and frequency-domain using the overlap-save algorithm.…”
Section: One-dimensional Convolutionmentioning
confidence: 99%
“…Jeremy Fowers et al studied the 1D convolution for image processing on the CPU, GPU and FPGA platforms [9] .…”
Section: Related Workmentioning
confidence: 99%