The Bayesian Learning Rule

Khan, Mohammad Emtiyaz; Rue, Håvard

doi:10.48550/arxiv.2107.04562

Cited by 13 publications

(21 citation statements)

References 57 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, note that many other strategies than EWA are studied: online gradient algorithm, follow-the-regularized-leader (FTRL), online mirror descent (OMD)... EWA is actually derived as a special case of FTRL and OMD in many references (e.g [162]) but conversely, [90,98] derive OMD and online gradient a EWA applied with various approximations.…”

Section: Online Learning 631 Sequential Predictionmentioning

confidence: 99%

User-friendly introduction to PAC-Bayes bounds

Alquier¹

2021

Preprint

View full text Add to dashboard Cite

Aggregated predictors are obtained by making a set of basic predictors vote according to some weights, that is, to some probability distribution.Randomized predictors are obtained by sampling in a set of basic predictors, according to some prescribed probability distribution.Thus, aggregated and randomized predictors have in common that they are not defined by a minimization problem, but by a probability distribution on the set of predictors. In statistical learning theory, there is a set of tools designed to understand the generalization ability of such procedures: PAC-Bayesian or PAC-Bayes bounds.Since the original PAC-Bayes bounds [163,124], these tools have been considerably improved in many directions (we will for example describe a simplified version of the localization technique of [39,41] that was missed by the community, and later rediscovered as "mutual information bounds"). Very recently, PAC-Bayes bounds received a considerable attention: for example there was workshop on PAC-Bayes at NIPS 2017, (Almost) 50 Shades of Bayesian Learning: PAC-Bayesian trends and insights, organized by B. Guedj, F. Bach and P. Germain. One of the reasons of this recent success is the successful application of these bounds to neural networks [65].An elementary introduction to PAC-Bayes theory is still missing. This is an attempt to provide such an introduction. This is a preliminary version. If you find any typo/mistake, if you think your work is not cited and should be, please let me know, and I will update the tutorial accordingly. Since 1st version: fixed (minor) problems in Theorem 4.5, in Lemma 4.6 and in Subsection 6.5.2, fixed many typos (including some in the proof of Theorem 4.3), included ref. [26,131,114].1 I don't want to scare the reader with measurability conditions, as I will not check them in this tutorial anyway. Here, the exact condition to ensure that what follows is well defined is that for any A ∈ T , the function ((x 1 , y 1 ), . . . , (x n , y n )) → [ρ((x 1 , y 1 ), . . . , (x n , y n ))] (A) is measurable. That is, ρ is a regular conditional probability.2 See the title of van Erven's tutorial [175]: "PAC-Bayes mini-tutorial: a continuous union bound". Note, however, that it is argued by Catoni in [41] that PAC-Bayes bounds are actually more than that, we will come back to this in Section 4.

show abstract

Section: Online Learning 631 Sequential Predictionmentioning

confidence: 99%

User-friendly introduction to PAC-Bayes bounds

Alquier¹

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…In this work we do not use or expand on the algorithmic viewpoint or optimization strategy of VB, but rather use the fundamental idea as proposed by [47], similar to the Bayesian learning rule (BLR) by [17]. [47] showed that the learned post-data model, q(θ θ θ|D) of θ θ θ based on information D (comprised of prior information I encoded in π(θ θ θ|I) and information from the data y y y, encoded in l(θ θ θ|y y y)), by using an optimal and efficient information processing rule is the posterior model as defined through Bayes' theorem.…”

Section: Variational Gaussian Approximation (Vga)mentioning

confidence: 99%

Correcting the Laplace Method with Variational Bayes

Niekerk¹,

Rue²

2021

Preprint

View full text Add to dashboard Cite

Approximate inference methods like the Laplace method, Laplace approximations and variational methods, amongst others, are popular methods when exact inference is not feasible due to the complexity of the model or the abundance of data. In this paper we propose a hybrid approximate method namely Low-Rank Variational Bayes correction (VBC), that uses the Laplace method and subsequently a Variational Bayes correction to the posterior mean. The cost is essentially that of the Laplace method which ensures scalability of the method. We illustrate the method and its advantages with simulated and real data, on small and large scale.

show abstract

“…In this framework, the algorithms derived in non-Bayesian settings are understood as the special cases where the temperature parameter is set to zero so that the entropy term in the cost function vanishes. What is more important to the context of this study is that [3] noted about the connection of the BLR to online learning.…”

Section: Introductionmentioning

confidence: 98%

“…A general way of understanding the Bayesian principle through the reformulated sequential optimisation-oriented view was originally populated in [1], [2] as the principle of maximum-entropy. Noticing from such fundamental principle, the work in [3] showed that many machine learning methods being used have a Bayesian nature. This gives rise to the known benefits in robustness and flexibility of the learning algorithms in the real world where the information is presented not all at once while the world keeps changing.…”

Section: Introductionmentioning

confidence: 99%

Bayesian Learning Approach to Model Predictive Control

Cho¹,

Lee²,

Shin³

et al. 2022

Preprint

View full text Add to dashboard Cite

This study presents a Bayesian learning perspective towards model predictive control algorithms. High-level frameworks have been developed separately in the earlier studies on Bayesian learning and sampling-based model predictive control. On one hand, the Bayesian learning rule provides a general framework capable of generating various machine learning algorithms as special instances. On the other hand, the dynamic mirror descent model predictive control framework is capable of diversifying sample-rollout-based control algorithms. However, connections between the two frameworks have still not been fully appreciated in the context of stochastic optimal control. This study combines the Bayesian learning rule point of view into the model predictive control setting by taking inspirations from the view of understanding model predictive controller as an online learner. The selection of posterior class and natural gradient approximation for the variational formulation governs diversification of model predictive control algorithms in the Bayesian learning approach to model predictive control. This alternative viewpoint complements the dynamic mirror descent framework through streamlining the explanation of design choices.

show abstract

The Bayesian Learning Rule

Cited by 13 publications

References 57 publications

User-friendly introduction to PAC-Bayes bounds

User-friendly introduction to PAC-Bayes bounds

Correcting the Laplace Method with Variational Bayes

Bayesian Learning Approach to Model Predictive Control

Contact Info

Product

Resources

About