2018
DOI: 10.48550/arxiv.1802.04537
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Tighter Variational Bounds are Not Necessarily Better

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
30
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 17 publications
(32 citation statements)
references
References 0 publications
2
30
0
Order By: Relevance
“…During training we used a K=100 importance weighted auto-encoder (IWAE) estimator for gradient computation (Burda et al, 2015). We implemented a doubly reparameterised gradient estimator (DReG) for IWAE, which has recently been shown to yield lower-variance gradient estimates than the gradient computed naively by reversemode automatic differentiation (Tucker et al, 2018), while also avoiding the known problem of poor gradients for the variational approximation with greater numbers of importance samples (Rainforth et al, 2018). We found that the DReG estimator gave modest improvements for the black-box, but very significant improvements in performance and rate of convergence for the white-box (see Appendix figure 8).…”
Section: Architecture and Optimisation Detailsmentioning
confidence: 99%
“…During training we used a K=100 importance weighted auto-encoder (IWAE) estimator for gradient computation (Burda et al, 2015). We implemented a doubly reparameterised gradient estimator (DReG) for IWAE, which has recently been shown to yield lower-variance gradient estimates than the gradient computed naively by reversemode automatic differentiation (Tucker et al, 2018), while also avoiding the known problem of poor gradients for the variational approximation with greater numbers of importance samples (Rainforth et al, 2018). We found that the DReG estimator gave modest improvements for the black-box, but very significant improvements in performance and rate of convergence for the white-box (see Appendix figure 8).…”
Section: Architecture and Optimisation Detailsmentioning
confidence: 99%
“…( 3). This is an important reason why tighter variational bounds are not necessarily better (Rainforth et al, 2018) and we therefore don't use it in place of the ELBO as the objective function. Some of other graphical models, such as the variational Gaussian mixture model (Attias, 1999), have the similar tighter lower bounds.…”
Section: Data Dependent Expectation (Dde)mentioning
confidence: 99%
“…Recent work [19] has shown that optimizing the importance weighted bound can degrade the overall learning process of the inference network because the signal-to-noise ratio of the gradient zi|x) is the gradient estimate of η). The SN R k (φ) converges to 0 for inference network as k → ∞, and the gradient estimates of φ become completely random.…”
Section: Iw-avb and Iw-aaementioning
confidence: 99%