Optimal Data Distribution for Versatile Finite Impulse Response Filtering on Next-Generation Graphics Hardware Using CUDA

Goorts, Patrik; Rogmans, Sammy; Bekaert, Philippe

doi:10.1109/icpads.2009.79

Cited by 10 publications

(11 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The algorithm is in essence a 2D linear FIR filtering. We already investigated FIR filtering using CUDA in our previous research (Goorts et al, 2009). Because the kernels are small, separating the kernels in 1D filters or using the Fourier transforms will not result in a speedup.…”

Section: Fir Filtering For Demosaicingmentioning

confidence: 99%

“…This method is chosen because it uses linear finite inpulse response (FIR) filtering to produce high-quality results. FIR filtering is known to map very well on CUDA (Goorts et al, 2009), which will maximize the performance, while preserving the quality. This method is implemented earlier by (McGuire, 2008) using traditional GPGPU paradigms, but these optimization principles do not map to CUDA.…”

Section: Introductionmentioning

confidence: 99%

“…the shared memory (Goorts et al, 2009). When implementing, for example, a 3 × 3 filter without optimizations, we can allocate one thread for every pixel of the image.…”

Section: Generic Fir Filteringmentioning

confidence: 99%

See 2 more Smart Citations

Raw Camera Image Demosaicing using Finite Impulse Response Filtering on Commodity GPU Hardware using CUDA

Goorts

Rogmans

Bekaert

2012

Proceedings of the International Conference on Signal Processing and Multimedia Applications and Wireless Information Networks

View full text Add to dashboard Cite

Abstract:In this paper, we investigate demosaicing of raw camera images on parallel architectures using CUDA. To generate high-quality results, we use the method of Malvar et al., which incorporates the gradient for edgesensing demosaicing. The method can be implemented as a collection of finite impulse response filters, which can easily be mapped to a parallel architecture. We investigated different trade-offs between memory operations and processor occupation to acquire maximum performance, and found a clear difference in optimization principles between different GPU architecture designs. We show that trade-offs are still important and not straightforward when using systems with massive fast processors and slower memory.

show abstract

Section: Fir Filtering For Demosaicingmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Raw Camera Image Demosaicing using Finite Impulse Response Filtering on Commodity GPU Hardware using CUDA

Goorts

Rogmans

Bekaert

2012

Proceedings of the International Conference on Signal Processing and Multimedia Applications and Wireless Information Networks

View full text Add to dashboard Cite

show abstract

“…This depth represents the global depth d l of the player (or group of players) in that foreground object. CUDA is used to harness the utilization of a user-managed cache, allowing more efficient memory management than can be obtained by normal texture lookups (Goorts et al, 2009). By exploiting the interoperability of Cg and CUDA, no performance penalty is perceived.…”

Section: Depth Selectionmentioning

confidence: 99%

Real-time Video-based View Interpolation of Soccer Events using Depth-selective Plane Sweeping

Goorts¹,

Ancuti²,

Dumont³

et al. 2013

Proceedings of the International Conference on Computer Vision Theory and Applications

View full text Add to dashboard Cite

Abstract:In this paper we present a novel technique to synthesize virtual camera viewpoints for soccer events. Our real-time video-based rendering technique does not require a precise estimation of the scene geometry. We initially segment the dynamic parts of the scene to consequently estimate a depth map of the filtered foreground regions using a plane sweep strategy. The depth map is indicatively segmented to depth information per player. A consecutive plane sweep is used, where the depth sweep is limited to the depth range of each player individually, effectively removing major ghosting artifacts, such as third legs or ghost players. The background and shadows are interpolated independently. For maximum performance our technique is implemented using a combination of NVIDIA's shaders language Cg and NVIDIA's general purpose computing framework CUDA.The rendered results of an actual soccer game demonstrate the usability and accuracy of our framework.

show abstract

“…Their adjustment of register pressure is ad-hoc, and not as systematic as ours. [18] and [19] separated a complex kernel into several simple kernels, which brings higher occupancy, due to less resource requirement of each simplified kernel. These techniques are either applicationspecific, or not suitable for stencils.…”

Section: Related Workmentioning

confidence: 99%

A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs

Yang

Cui

Feng

et al. 2012

J. Comput. Sci. Technol.

View full text Add to dashboard Cite

In this paper, we present a hybrid circular queue method that can significantly boost the performance of stencil computations on GPU by carefully balancing usage of registers and shared-memory. Unlike earlier methods that rely on circular queues predominantly implemented using indirectly addressable shared memory, our hybrid method exploits a new reuse pattern spanning across the multiple time steps in stencil computations so that circular queues can be implemented by both shared memory and registers eAEectively in a balanced manner. We describe a framework that automatically finds the best placement of data in registers and shared memory in order to maximize the performance of stencil computations. Validation using four diAEerent types of stencils on three diAEerent GPU platforms shows that our hybrid method achieves speedups up to 2.93X over methods that use circular queues implemented with shared-memory only.

show abstract

Optimal Data Distribution for Versatile Finite Impulse Response Filtering on Next-Generation Graphics Hardware Using CUDA

Cited by 10 publications

References 6 publications

Raw Camera Image Demosaicing using Finite Impulse Response Filtering on Commodity GPU Hardware using CUDA

Raw Camera Image Demosaicing using Finite Impulse Response Filtering on Commodity GPU Hardware using CUDA

Real-time Video-based View Interpolation of Soccer Events using Depth-selective Plane Sweeping

A Hybrid Circular Queue Method for Iterative Stencil Computations on GPUs

Contact Info

Product

Resources

About