2021
DOI: 10.48550/arxiv.2107.04562
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Bayesian Learning Rule

Abstract: We show that many machine-learning algorithms are specific instances of a single algorithm called the Bayesian learning rule. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
19
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 13 publications
(21 citation statements)
references
References 57 publications
0
19
0
Order By: Relevance
“…Finally, note that many other strategies than EWA are studied: online gradient algorithm, follow-the-regularized-leader (FTRL), online mirror descent (OMD)... EWA is actually derived as a special case of FTRL and OMD in many references (e.g [162]) but conversely, [90,98] derive OMD and online gradient a EWA applied with various approximations.…”
Section: Online Learning 631 Sequential Predictionmentioning
confidence: 99%
“…Finally, note that many other strategies than EWA are studied: online gradient algorithm, follow-the-regularized-leader (FTRL), online mirror descent (OMD)... EWA is actually derived as a special case of FTRL and OMD in many references (e.g [162]) but conversely, [90,98] derive OMD and online gradient a EWA applied with various approximations.…”
Section: Online Learning 631 Sequential Predictionmentioning
confidence: 99%
“…In this work we do not use or expand on the algorithmic viewpoint or optimization strategy of VB, but rather use the fundamental idea as proposed by [47], similar to the Bayesian learning rule (BLR) by [17]. [47] showed that the learned post-data model, q(θ θ θ|D) of θ θ θ based on information D (comprised of prior information I encoded in π(θ θ θ|I) and information from the data y y y, encoded in l(θ θ θ|y y y)), by using an optimal and efficient information processing rule is the posterior model as defined through Bayes' theorem.…”
Section: Variational Gaussian Approximation (Vga)mentioning
confidence: 99%
“…In this framework, the algorithms derived in non-Bayesian settings are understood as the special cases where the temperature parameter is set to zero so that the entropy term in the cost function vanishes. What is more important to the context of this study is that [3] noted about the connection of the BLR to online learning.…”
Section: Introductionmentioning
confidence: 98%
“…A general way of understanding the Bayesian principle through the reformulated sequential optimisation-oriented view was originally populated in [1], [2] as the principle of maximum-entropy. Noticing from such fundamental principle, the work in [3] showed that many machine learning methods being used have a Bayesian nature. This gives rise to the known benefits in robustness and flexibility of the learning algorithms in the real world where the information is presented not all at once while the world keeps changing.…”
Section: Introductionmentioning
confidence: 99%