This paper is the first work to propose a network to predict a structured uncertainty distribution for a synthesized image. Previous approaches have been mostly limited to predicting diagonal covariance matrices [15]. Our novel model learns to predict a full Gaussian covariance matrix for each reconstruction, which permits efficient sampling and likelihood evaluation.We demonstrate that our model can accurately reconstruct ground truth correlated residual distributions for synthetic datasets and generate plausible high frequency samples for real face images. We also illustrate the use of these predicted covariances for structure preserving image denoising.
Input image (previously unseen) (b) User requested edit: "beak larger than head" (c) User requested edit: "beak smaller than head" Figure 1: Semantic image editing at high resolution (2480×1850). The user requests a change in a semantic attribute and the input image (a) is automatically transformed by our method into, e.glet@tokeneonedot, an image with "beak larger than head" (b) or "beak smaller than head" (c). The content of the original input, including fine details, is preserved. Our focus is on face editing, as previous work, yet the method is general enough to be applied to other datasets. Please see the supplemental material for videos of these and other edits.
Figure 1: Image editing example. (a) An input image is decomposed using a Laplacian pyramid. The user paints into the coarse image. The input image is projected to a low-dimensional representation and reconstructed from it. (b) Several samples are generated conditioned on the painted coarse image. (c) The regions of interest in (b) are composed with the reconstructed image.ABSTRACT Variational Autoencoders (VAE) learn a latent representation of image data that allows natural image generation and manipulation. However, they struggle to generate sharp images. To address this problem, we propose a hierarchy of VAEs analogous to a Laplacian pyramid. Each network models a single pyramid level, and is conditioned on the coarser levels. The Laplacian architecture allows for novel image editing applications that take advantage of the coarse to fine structure of the model. Our method achieves lower reconstruction error in terms of MSE, which is the loss function of the VAE and is not directly minimised in our model. Furthermore, the reconstructions generated by the proposed model are preferred over those from the VAE by human evaluators.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.