Twenty-First International Conference on Machine Learning - ICML '04 2004
DOI: 10.1145/1015330.1015396
|View full text |Cite
|
Sign up to set email alerts
|

Approximate inference by Markov chains on union spaces

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
164
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 121 publications
(165 citation statements)
references
References 3 publications
1
164
0
Order By: Relevance
“…Hence, if the variational frequentist estimate is consistent, then the variational Bayes posterior converges to a Gaussian with a mean centred at the true model parameter. Furthermore, since variational Bayes rests on optimization, variational inference easily takes advantage of methods such as stochastic optimization (Robbins and Monro 1951, Kushner and Yin 1997) and distributed optimization (though some MCMC methods can also exploit these innovations (Welling and Teh 2011, Ahmed et al. 2012)).…”
Section: Statistical Regularizationmentioning
confidence: 99%
“…Hence, if the variational frequentist estimate is consistent, then the variational Bayes posterior converges to a Gaussian with a mean centred at the true model parameter. Furthermore, since variational Bayes rests on optimization, variational inference easily takes advantage of methods such as stochastic optimization (Robbins and Monro 1951, Kushner and Yin 1997) and distributed optimization (though some MCMC methods can also exploit these innovations (Welling and Teh 2011, Ahmed et al. 2012)).…”
Section: Statistical Regularizationmentioning
confidence: 99%
“…We quantified the effect of circular filters on motif recognition further by comparing network architectures with and without circular filters for a variety of hyperparameter combinations. These included the number of positive training examples, L 2 -regularization strength and the amount of noise injected into parameter updates via SGLD (Welling and Teh, 2011). To investigate the effect of the weighted sum of activations that appears in the CNN with circular filters (Fig.…”
Section: Resultsmentioning
confidence: 99%
“…Model training. Models were trained by minimizing the cross-entropy between network outputs and sequence labels using either mini-batch stochastic gradient descent or Stochastic Gradient Langevin Dynamics (SGLD) (Welling and Teh, 2011), depending on the experiment. When SGLD was used, the magnitude of the noise injected into the gradients was scaled by a factor γ , resulting in γϵN(0,1) as injected noise, with ϵ as learning rate.…”
Section: Methodsmentioning
confidence: 99%
“…It should be emphasized that we should not consider these methods as in a pure competition; instead, they can be used in complement with each other. For example, stochastic gradient Langevin dynamics (SGLD) [40] can be viewed as a combination of gradient descent and annealing, and in [41], it is mentioned that inclusion of the deterministic hill climber (discrete version of gradient descent) can lead to a substantial speedup in the PMBGA.…”
Section: Discussionmentioning
confidence: 99%