Abstract:In this paper, we investigate demosaicing of raw camera images on parallel architectures using CUDA. To generate high-quality results, we use the method of Malvar et al., which incorporates the gradient for edgesensing demosaicing. The method can be implemented as a collection of finite impulse response filters, which can easily be mapped to a parallel architecture. We investigated different trade-offs between memory operations and processor occupation to acquire maximum performance, and found a clear difference in optimization principles between different GPU architecture designs. We show that trade-offs are still important and not straightforward when using systems with massive fast processors and slower memory.