2016
DOI: 10.48550/arxiv.1611.02731
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Variational Lossy Autoencoder

Abstract: Representation learning seeks to expose certain aspects of observed data in a learned representation that's amenable to downstream tasks like classification. For instance, a good representation for 2D images might be one that describes only global structure and discards information about detailed texture. In this paper, we present a simple but principled method to learn such global representations by combining Variational Autoencoder (VAE) with neural autoregressive models such as RNN, MADE and PixelRNN/CNN. O… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
90
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 90 publications
(96 citation statements)
references
References 22 publications
(33 reference statements)
2
90
0
Order By: Relevance
“…Training VRAG involves optimizing two objectives -reducing the KL-divergence between the document-prior and document-posterior, and, maximizing the log likelihood of the responses. VAE models often end up prioritizing the KL-divergence over the likelihood objective and sometimes end up with zero KLdivergence by forcing the document-posterior to match the prior (called posterior-collapse) (Lucas et al 2019;Bowman et al 2015;Chen et al 2016;Oord, Vinyals, and Kavukcuoglu 2017). However, we hypothesize even in cases where there is no posterior collapse, the joint training could result in the response-generator (likelihood term) being inadequately trained.…”
Section: Effect Of Decoder Fine-tuningmentioning
confidence: 89%
“…Training VRAG involves optimizing two objectives -reducing the KL-divergence between the document-prior and document-posterior, and, maximizing the log likelihood of the responses. VAE models often end up prioritizing the KL-divergence over the likelihood objective and sometimes end up with zero KLdivergence by forcing the document-posterior to match the prior (called posterior-collapse) (Lucas et al 2019;Bowman et al 2015;Chen et al 2016;Oord, Vinyals, and Kavukcuoglu 2017). However, we hypothesize even in cases where there is no posterior collapse, the joint training could result in the response-generator (likelihood term) being inadequately trained.…”
Section: Effect Of Decoder Fine-tuningmentioning
confidence: 89%
“…There has also been a variety of work on applying diffusion to the latent space of a VAE (Vahdat et al, 2021;Mittal et al, 2021;Wehenkel & Louppe, 2021;Sinha et al, 2021). Similarly there have been various works that use flow priors for VAEs (Chen et al, 2016;Huang et al, 2017;Xiao et al, 2019). These are in contrast to our work that applies diffusion and flows to functional representations.…”
Section: Related Workmentioning
confidence: 95%
“…On the other hand, giving the KL divergence a small coefficient indeed helps to retain more information about the input data, but it may destroy the consistency between the learned encoder distribution and the prior distribution (see Figure 1(a)). The samples from the inconsistent region, which between the encoder distribution (blue regions) and the prior (yellow region), will cause poor generation quality [6].…”
Section: Introductionmentioning
confidence: 99%
“…Although these works can make more flexible and powerful approximation of the posterior in the latent space than the vanilla VAE, the optimization object of the encoder and decoder still remain its primitive problems. In fact, the KL divergence that forces every posterior close to the prior distribution is equivalent to making the posterior irrelevant to the input data [6,33]. However, the loss of the decoder needs the information related to the data to ensure the quality of the reconstruction [28].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation