Variational Lossy Autoencoder

Chen, Xi; Kingma, Diederik P.; Salimans, Tim; Duan, Yan; Dhariwal, Prafulla; Schulman, John; Sutskever, Ilya; Abbeel, Pieter

doi:10.48550/arxiv.1611.02731

Cited by 90 publications

(96 citation statements)

References 22 publications

(33 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Training VRAG involves optimizing two objectives -reducing the KL-divergence between the document-prior and document-posterior, and, maximizing the log likelihood of the responses. VAE models often end up prioritizing the KL-divergence over the likelihood objective and sometimes end up with zero KLdivergence by forcing the document-posterior to match the prior (called posterior-collapse) (Lucas et al 2019;Bowman et al 2015;Chen et al 2016;Oord, Vinyals, and Kavukcuoglu 2017). However, we hypothesize even in cases where there is no posterior collapse, the joint training could result in the response-generator (likelihood term) being inadequately trained.…”

Section: Effect Of Decoder Fine-tuningmentioning

confidence: 89%

Variational Learning for Unsupervised Knowledge Grounded Dialogs

Mishra,

Madan,

Pandey

et al. 2021

Preprint

View full text Add to dashboard Cite

Recent methods for knowledge grounded dialogs generate responses by incorporating information from an external textual document (Lewis et al. 2020;Guu et al. 2020). These methods do not require the exact document to be known during training and rely on the use of a retrieval system to fetch relevant documents from a large index. The documents used to generate the responses are modeled as latent variables whose prior probabilities need to be estimated. Models such as RAG (Lewis et al. 2020), marginalize the document probabilities over the documents retrieved from the index to define the log likelihood loss function which is optimized end-to-end. In this paper, we develop a variational approach to the above technique wherein, we instead maximize the Evidence Lower bound (ELBO). Using a collection of three publicly available open-conversation datasets, we demonstrate how the posterior distribution, that has information from the ground-truth response, allows for a better approximation of the objective function during training. To overcome the challenges associated with sampling over a large knowledge collection, we develop an efficient approach to approximate the ELBO. To the best of our knowledge we are the first to apply variational training for open-scale unsupervised knowledge grounded dialog systems.1 typical k = 5-10. Note that this an approximation of the document-prior distribution -sampling a document based on the prior distribution, would require retrieval scores for each document in the index collection which can be expensive to compute (Guu et al. 2020;Lewis et al. 2020)

show abstract

Section: Effect Of Decoder Fine-tuningmentioning

confidence: 89%

Variational Learning for Unsupervised Knowledge Grounded Dialogs

Mishra,

Madan,

Pandey

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…There has also been a variety of work on applying diffusion to the latent space of a VAE (Vahdat et al, 2021;Mittal et al, 2021;Wehenkel & Louppe, 2021;Sinha et al, 2021). Similarly there have been various works that use flow priors for VAEs (Chen et al, 2016;Huang et al, 2017;Xiao et al, 2019). These are in contrast to our work that applies diffusion and flows to functional representations.…”

Section: Related Workmentioning

confidence: 95%

From data to functa: Your data point is a function and you can treat it like one

Dupont¹,

Kim²,

Eslami³

et al. 2022

Preprint

View full text Add to dashboard Cite

It is common practice in deep learning to represent a measurement of the world on a discrete grid, e.g. a 2D grid of pixels. However, the underlying signal represented by these measurements is often continuous, e.g. the scene depicted in an image. A powerful continuous alternative is then to represent these measurements using an implicit neural representation, a neural function trained to output the appropriate measurement value for any input spatial location. In this paper, we take this idea to its next level: what would it take to perform deep learning on these functions instead, treating them as data? In this context we refer to the data as functa, and propose a framework for deep learning on functa. This view presents a number of challenges around efficient conversion from data to functa, compact representation of functa, and effectively solving downstream tasks on functa. We outline a recipe to overcome these challenges and apply it to a wide range of data modalities including images, 3D shapes, neural radiance fields (NeRF) and data on manifolds. We demonstrate that this approach has various compelling properties across data modalities, in particular on the canonical tasks of generative modeling, data imputation, novel view synthesis and classification.

show abstract

“…On the other hand, giving the KL divergence a small coefficient indeed helps to retain more information about the input data, but it may destroy the consistency between the learned encoder distribution and the prior distribution (see Figure 1(a)). The samples from the inconsistent region, which between the encoder distribution (blue regions) and the prior (yellow region), will cause poor generation quality [6].…”

Section: Introductionmentioning

confidence: 99%

“…Although these works can make more flexible and powerful approximation of the posterior in the latent space than the vanilla VAE, the optimization object of the encoder and decoder still remain its primitive problems. In fact, the KL divergence that forces every posterior close to the prior distribution is equivalent to making the posterior irrelevant to the input data [6,33]. However, the loss of the decoder needs the information related to the data to ensure the quality of the reconstruction [28].…”

Section: Introductionmentioning

confidence: 99%

“…This gap will lead some samples in latent space from the region that never be seen by the decoder in training, and finally cause a quality loss in generating new datas. Up to present, there are several ways to fix this issue, such as choose an autoregressive prior which obtained by using the IAF on the random noise [6]; apply a prior computing by a mixture posterior distribution condition on the learnable pseudo-inputs [29]. Also, it is an intuitive thinking that utilizing the Gaussian mixture model is a straightforward method, but it is not flexible enough comparing to the method above.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

LDC-VAE: A Latent Distribution Consistency Approach to Variational AutoEncoders

Chen¹,

Gong²,

He³

et al. 2021

Preprint

View full text Add to dashboard Cite

Variational autoencoders (VAEs), as an important aspect of generative models, have received a lot of research interests and reached many successful applications. However, it is always a challenge to achieve the consistency between the learned latent distribution and the prior latent distribution when optimizing the evidence lower bound (ELBO), and finally leads to an unsatisfactory performance in data generation. In this paper, we propose a latent distribution consistency approach to avoid such substantial inconsistency between the posterior and prior latent distributions in ELBO optimizing. We name our method as latent distribution consistency VAE (LDC-VAE). We achieve this purpose by assuming the real posterior distribution in latent space as a Gibbs form, and approximating it by using our encoder. However, there is no analytical solution for such Gibbs posterior in approximation, and traditional approximation ways are time consuming, such as using the iterative sampling-based MCMC. To address this problem, we use the Stein Variational Gradient Descent (SVGD) to approximate the Gibbs posterior. Meanwhile, we use the SVGD to train a sampler net which can obtain efficient samples from the Gibbs posterior. Comparative studies on the popular image generation datasets show that our method has achieved comparable or even better performance than several powerful improvements of VAEs.

show abstract

Variational Lossy Autoencoder

Cited by 90 publications

References 22 publications

Variational Learning for Unsupervised Knowledge Grounded Dialogs

Variational Learning for Unsupervised Knowledge Grounded Dialogs

From data to functa: Your data point is a function and you can treat it like one

LDC-VAE: A Latent Distribution Consistency Approach to Variational AutoEncoders

Contact Info

Product

Resources

About