Burst Denoising with Kernel Prediction Networks

Mildenhall, Ben; Barron, Jonathan T.; Chen, Jiawen; Sharlet, Dillon; Ng, Ren; Carroll, Robert E.

doi:10.1109/cvpr.2018.00265

Cited by 397 publications

(455 citation statements)

References 22 publications

Supporting

Mentioning

439

Contrasting

Order By: Relevance

“…In recent work, the use of deep convolutional neural networks (CNNs) has become a common theme to improve image processing algorithms for a better imaging pipeline. Examples include models that perform demosaicing [4], denoising [5,6], and many other types of image enhancement and transformation methods [7,8,9].…”

Section: Related Workmentioning

confidence: 99%

VisionISP: Repurposing the Image Signal Processor for Computer Vision Applications

Isikdogan

Sushma

et al. 2019

2019 IEEE International Conference on Image Processing (ICIP)

View full text Add to dashboard Cite

Traditional image signal processors (ISPs) are primarily designed and optimized to improve the image quality perceived by humans. However, optimal perceptual image quality does not always translate into optimal performance for computer vision applications. We propose a set of methods, which we collectively call VisionISP, to repurpose the ISP for machine consumption. VisionISP significantly reduces data transmission needs by reducing the bit-depth and resolution while preserving the relevant information. The blocks in VisionISP are simple, content-aware, and trainable. Experimental results show that VisionISP boosts the performance of a subsequent computer vision system trained to detect objects in an autonomous driving setting. The results demonstrate the potential and the practicality of VisionISP for computer vision applications.

show abstract

Section: Related Workmentioning

confidence: 99%

VisionISP: Repurposing the Image Signal Processor for Computer Vision Applications

Isikdogan

Sushma

et al. 2019

2019 IEEE International Conference on Image Processing (ICIP)

View full text Add to dashboard Cite

show abstract

“…This allows us to aggregate sparse but highly related samples only, enabling an efficient implementation in terms of speed and memory and achieving state-ofthe-art results even with kernels of size 3×3 on several tasks. For comparison, the adaptive convolution and kernel prediction networks require much larger neighbors (e.g., 21 × 21 in (Bako et al 2017;Vogels et al 2018), 41 × 41 in (Niklaus et al 2017), and 8×5×5 in (Mildenhall et al 2018)). As will be seen in our experiments, learning sampling locations of neighbors clearly boosts the performance significantly compared to learning kernel weights only.…”

Section: Variants Of the Spatial Transformermentioning

confidence: 99%

“…multi-modal images (e.g., RGB/D images in depth map upsampling and RGB/cost images in semantic segmentation), and thus is applicable to various tasks including depth and saliency map upsampling, cross-modality image restoration, texture removal, and semantic segmentation. In contrast, the adaptive convolution network is specialized to video frame interpolation, and kernel prediction networks are applicable to denoising Monte Carlo renderings (Bako et al 2017;Vogels et al 2018) or burst denoising (Mildenhall et al 2018) only. Finally, our model learns spatially-variant kernels to compute residual images, not a final output as in (Bako et al 2017;Jia et al 2016;Mildenhall et al 2018;Niklaus et al 2017;Vogels et al 2018), with constraints on weight regression.…”

Section: Variants Of the Spatial Transformermentioning

confidence: 99%

“…In contrast, the adaptive convolution network is specialized to video frame interpolation, and kernel prediction networks are applicable to denoising Monte Carlo renderings (Bako et al 2017;Vogels et al 2018) or burst denoising (Mildenhall et al 2018) only. Finally, our model learns spatially-variant kernels to compute residual images, not a final output as in (Bako et al 2017;Jia et al 2016;Mildenhall et al 2018;Niklaus et al 2017;Vogels et al 2018), with constraints on weight regression. This allows the use of residual connections for adaptive convolution and kernel prediction networks, and achieves better results.…”

Section: Variants Of the Spatial Transformermentioning

confidence: 99%

See 1 more Smart Citation

3D Interpreter Networks for Viewer-Centered Wireframe Modeling

et al. 2018

View full text Add to dashboard Cite

Joint image filters are used to transfer structural details from a guidance picture used as a prior to a target image, in tasks such as enhancing spatial resolution and suppressing noise. Previous methods based on convolutional neural networks (CNNs) combine nonlinear activations of spatially-invariant kernels to estimate structural details and regress the filtering result. In this paper, we instead learn explicitly sparse and spatially-variant kernels. We propose a CNN architecture and its efficient implementation, called the deformable kernel network (DKN), that outputs sets of neighbors and the corresponding weights adaptively for each pixel. The filtering result is then computed as a weighted average. We also propose a fast version of DKN that runs about seventeen times faster for an image of size 640 × 480. We demonstrate the effectiveness and flexibility of our models on the tasks of depth map upsampling, saliency map upsampling, cross-modality image restoration, texture removal, and semantic segmentation. In particular, we show that the weighted averaging process with sparsely sampled 3 × 3 kernels outperforms the state of the art by a significant margin in all cases.

show abstract

“…Several recent works proposed to leverage neural networks to infer locally optimal parameters for regression models [KBS15], reconstruct a noise‐free image using predicted kernels [BVM*17, MBC*17] or produce the image directly [CKS*17]. While deep learning will undoubtedly offset denoising performance in the future, the acquisition of sufficiently large training sets (there are currently none with deep images), the increased memory requirements due to the deep structure and their relatively poor generalization currently permit deployment only in big production houses.…”

Section: Related Workmentioning

confidence: 99%

Denoising Deep Monte Carlo Renderings

Vicini

Adler

Novák

et al. 2018

Computer Graphics Forum

View full text Add to dashboard Cite

We present a novel algorithm to denoise deep Monte Carlo renderings, in which pixels contain multiple colour values, each for a different range of depths. Deep images are a more expressive representation of the scene than conventional flat images. However, since each depth bin receives only a fraction of the flat pixel's samples, denoising the bins is harder due to the less accurate mean and variance estimates. Furthermore, deep images lack a regular structure in depth—the number of depth bins and their depth ranges vary across pixels. This prevents a straightforward application of patch‐based distance metrics frequently used to improve the robustness of existing denoising filters. We address these constraints by combining a flat image‐space non‐local means filter operating on pixel colours with a deep cross‐bilateral filter operating on auxiliary features (albedo, normal, etc.). Our approach significantly reduces noise in deep images while preserving their structure. To our best knowledge, our algorithm is the first to enable efficient deep‐compositing workflows with denoised Monte Carlo renderings. We demonstrate the performance of our filter on a range of scenes highlighting the challenges and advantages of denoising deep images.

show abstract

Burst Denoising with Kernel Prediction Networks

Cited by 397 publications

References 22 publications

VisionISP: Repurposing the Image Signal Processor for Computer Vision Applications

VisionISP: Repurposing the Image Signal Processor for Computer Vision Applications

3D Interpreter Networks for Viewer-Centered Wireframe Modeling

Denoising Deep Monte Carlo Renderings

Contact Info

Product

Resources

About