2019
DOI: 10.1016/j.neunet.2018.09.013
|View full text |Cite
|
Sign up to set email alerts
|

Weighted contrastive divergence

Abstract: Learning algorithms for energy based Boltzmann architectures that rely on gradient descent are in general computationally prohibitive, typically due to the exponential number of terms involved in computing the partition function. In this way one has to resort to approximation schemes for the evaluation of the gradient. This is the case of Restricted Boltzmann Machines (RBM) and its learning algorithm Contrastive Divergence (CD). It is well-known that CD has a number of shortcomings, and its approximation to th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(10 citation statements)
references
References 21 publications
(28 reference statements)
0
10
0
Order By: Relevance
“…[7], the authors study, in a systematic way, the convergence properties of CD, PCD and PT on several small toy models that can be analyzed exactly, that is, where the LL can be computed by brute-force. We can find more recent works [32,33,34,35] improving the learning scheme for RBMs and yet, still not giving much information about the quality of the generated samples nor the equilibrium properties of the trained models. In our results below, we will show that without putting on the table this information, the comparison between methods or tuning of parameters, becomes extremely unstable.…”
Section: Related Workmentioning
confidence: 99%
“…[7], the authors study, in a systematic way, the convergence properties of CD, PCD and PT on several small toy models that can be analyzed exactly, that is, where the LL can be computed by brute-force. We can find more recent works [32,33,34,35] improving the learning scheme for RBMs and yet, still not giving much information about the quality of the generated samples nor the equilibrium properties of the trained models. In our results below, we will show that without putting on the table this information, the comparison between methods or tuning of parameters, becomes extremely unstable.…”
Section: Related Workmentioning
confidence: 99%
“…A number of recent works have explored the parity dataset using restricted Boltzmann machines (RBMs) and found it to be difficult to learn, even in experiments that train using the entire dataset [ 11 , 21 ]. Recall that an RBM is a universal approximator of distributions on , given sufficiently many hidden units.…”
Section: Discussionmentioning
confidence: 99%
“…The dataset can be frustrating to learn for other models, such as restricted Boltzman machines (RBMs) trained with gradient-based methods. The difficulty of training RBMs to learn parity with contrastive divergence and related training algorithms is noted in [ 11 ]. The difficulty for other gradient based deep-learning methods on parity problems has been studied in [ 12 ].…”
Section: Introductionmentioning
confidence: 99%
“…But unlike CD, PCD keeps a persistent chain to estimate negative gradient. Many CD variants have been proposed to improve the negative gradient estimation as in [13], but almost all based on persistent chain [14], [15], [16].…”
Section: Persistent Contrastive Divergencementioning
confidence: 99%
“…13) where b a and b s are biases of a t and s t+1 respectively; W •F1 and W •F2 are the factorization weights w.r.t. first and second factor respectively; the dynamic bias is defined bŷ b h k = b h k + s t B •k ; and • corresponds to element-wise matrix multiplication.…”
mentioning
confidence: 99%