2009 15th International Conference on Parallel and Distributed Systems 2009
DOI: 10.1109/icpads.2009.79
|View full text |Cite
|
Sign up to set email alerts
|

Optimal Data Distribution for Versatile Finite Impulse Response Filtering on Next-Generation Graphics Hardware Using CUDA

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2012
2012
2014
2014

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 10 publications
(11 citation statements)
references
References 6 publications
0
11
0
Order By: Relevance
“…The algorithm is in essence a 2D linear FIR filtering. We already investigated FIR filtering using CUDA in our previous research (Goorts et al, 2009). Because the kernels are small, separating the kernels in 1D filters or using the Fourier transforms will not result in a speedup.…”
Section: Fir Filtering For Demosaicingmentioning
confidence: 99%
See 2 more Smart Citations
“…The algorithm is in essence a 2D linear FIR filtering. We already investigated FIR filtering using CUDA in our previous research (Goorts et al, 2009). Because the kernels are small, separating the kernels in 1D filters or using the Fourier transforms will not result in a speedup.…”
Section: Fir Filtering For Demosaicingmentioning
confidence: 99%
“…This method is chosen because it uses linear finite inpulse response (FIR) filtering to produce high-quality results. FIR filtering is known to map very well on CUDA (Goorts et al, 2009), which will maximize the performance, while preserving the quality. This method is implemented earlier by (McGuire, 2008) using traditional GPGPU paradigms, but these optimization principles do not map to CUDA.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…This depth represents the global depth d l of the player (or group of players) in that foreground object. CUDA is used to harness the utilization of a user-managed cache, allowing more efficient memory management than can be obtained by normal texture lookups (Goorts et al, 2009). By exploiting the interoperability of Cg and CUDA, no performance penalty is perceived.…”
Section: Depth Selectionmentioning
confidence: 99%
“…Their adjustment of register pressure is ad-hoc, and not as systematic as ours. [18] and [19] separated a complex kernel into several simple kernels, which brings higher occupancy, due to less resource requirement of each simplified kernel. These techniques are either applicationspecific, or not suitable for stencils.…”
Section: Related Workmentioning
confidence: 99%