a) Input semantic layouts (b) Synthesized images † Intel Labs ‡ Stanford University demonstrate this by synthesizing photographic images at 2-megapixel resolution, the full resolution of our training data. Extensive perceptual experiments on datasets of outdoor and indoor scenes demonstrate that images synthesized by the presented approach are considerably more realistic than alternative approaches.
This paper proposes to apply the nonlocal principle to general alpha matting for the simultaneous extraction of multiple image layers; each layer may have disjoint as well as coherent segments typical of foreground mattes in natural image matting. The estimated alphas also satisfy the summation constraint. As in nonlocal matting, our approach does not assume the local color-line model and does not require sophisticated sampling or learning strategies. On the other hand, our matting method generalizes well to any color or feature space in any dimension, any number of alphas and layers at a pixel beyond two, and comes with an arguably simpler implementation, which we have made publicly available. Our matting technique, aptly called KNN matting, capitalizes on the nonlocal principle by using $(K)$ nearest neighbors (KNN) in matching nonlocal neighborhoods, and contributes a simple and fast algorithm that produces competitive results with sparse user markups. KNN matting has a closed-form solution that can leverage the preconditioned conjugate gradient method to produce an efficient implementation. Experimental evaluation on benchmark datasets indicates that our matting results are comparable to or of higher quality than state-of-the-art methods requiring more involved implementation. In this paper, we take the nonlocal principle beyond alpha estimation and extract overlapping image layers using the same Laplacian framework. Given the alpha value, our closed form solution can be elegantly generalized to solve the multilayer extraction problem. We perform qualitative and quantitative comparisons to demonstrate the accuracy of the extracted image layers.
InputOur result L0 smoothing Multiscale tone Photographic style Nonlocal dehazing Pencil drawing Figure 1. We present an approach to approximating image processing operators. This figure shows the results for five operators: L0 gradient minimization, multiscale tone manipulation, photographic style transfer, nonlocal dehazing, and pencil drawing. All operators are approximated by the same model, with the same set of parameters and the same flow of computation. AbstractWe present an approach to accelerating a wide variety of image processing operators. Our approach uses a fullyconvolutional network that is trained on input-output pairs that demonstrate the operator's action. After training, the original operator need not be run at all. The trained network operates at full resolution and runs in constant time. We investigate the effect of network architecture on approximation accuracy, runtime, and memory footprint, and identify a specific architecture that balances these considerations. We evaluate the presented approach on ten advanced image processing operators, including multiple variational * Joint first authors models, multiscale tone and detail manipulation, photographic style transfer, nonlocal dehazing, and nonphotorealistic stylization. All operators are approximated by the same model. Experiments demonstrate that the presented approach is significantly more accurate than prior approximation schemes. It increases approximation accuracy as measured by PSNR across the evaluated operators by 8.5 dB on the MIT-Adobe dataset (from 27.5 to 36 dB) and reduces DSSIM by a multiplicative factor of 3 compared to the most accurate prior approximation scheme, while being the fastest. We show that our models generalize across datasets and across resolutions, and investigate a number of extensions of the presented approach.
We present a model for intrinsic decomposition of RGB-D images. Our approach analyzes a single RGB-D image and estimates albedo and shading fields that explain the input. To disambiguate the problem, our model estimates a number of components that jointly account for the reconstructed shading. By decomposing the shading field, we can build in assumptions about image formation that help distinguish reflectance variation from shading. These assumptions are expressed as simple nonlocal regularizers. We evaluate the model on real-world images and on a challenging synthetic dataset. The experimental results demonstrate that the presented approach outperforms prior models for intrinsic decomposition of RGB-D images.
We present a global optimization approach to optical flow estimation. The approach optimizes a classical optical flow objective over the full space of mappings between discrete grids. No descriptor matching is used. The highly regular structure of the space of mappings enables optimizations that reduce the computational complexity of the algorithm's inner loop from quadratic to linear and support efficient matching of tens of thousands of nodes to tens of thousands of displacements. We show that one-shot global optimization of a classical Horn-Schunck-type objective over regular grids at a single resolution is sufficient to initialize continuous interpolation and achieve state-of-the-art performance on challenging modern benchmarks.
We present an approach to nonrigid registration of 3D surfaces. We cast isometric embedding as MRF optimization and apply efficient global optimization algorithms based on linear programming relaxations. The Markov random field perspective suggests a natural connection with robust statistics and motivates robust forms of the intrinsic distortion functional. Our approach outperforms a large body of prior work by a significant margin, increasing registration precision on real data by a factor of 3.
We present an end-to-end deep learning approach to denoising speech signals by processing the raw waveform directly. Given input audio containing speech corrupted by an additive background signal, the system aims to produce a processed signal that contains only the speech content. Recent approaches have shown promising results using various deep network architectures. In this paper, we propose to train a fully-convolutional context aggregation network using a deep feature loss. That loss is based on comparing the internal feature activations in a different network, trained for acoustic environment detection and domestic audio tagging. Our approach outperforms the stateof-the-art in objective speech quality metrics and in large-scale perceptual experiments with human listeners. It also outperforms an identical network trained using traditional regression losses. The advantage of the new approach is particularly pronounced for the hardest data with the most intrusive background noise, for which denoising is most needed and most challenging.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.