2011
DOI: 10.1162/neco_a_00142
|View full text |Cite
|
Sign up to set email alerts
|

A Connection Between Score Matching and Denoising Autoencoders

Abstract: Denoising autoencoders have been previously shown to be competitive alternatives to Restricted Boltzmann Machines for unsupervised pre-training of each layer of a deep architecture. We show that a simple denoising autoencoder training criterion is equivalent to matching the score (with respect to the data) of a specific energy based model to that of a non-parametric Parzen density estimator of the data. This yields several useful insights. It defines a proper probabilistic model for the denoising autoencoder t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

4
476
1
1

Year Published

2015
2015
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 562 publications
(482 citation statements)
references
References 11 publications
(19 reference statements)
4
476
1
1
Order By: Relevance
“…In [16,34,8], advanced sampling methods for computing the negative part of the gradient based on tempering were proposed and shown to improve and stabilize learning. h [1] h [1] h [1] h [2] h [ A single-layer DAE is a special form of multi-layer perception network with a single hidden layer and a tied set of weights [45] (see Figure 3 (b)). A DAE is a network that reconstructs a corrupted input vector as well as possible by minimizing the following cost function…”
Section: Restricted Boltzmann Machines and Denoising Autoencodersmentioning
confidence: 99%
“…In [16,34,8], advanced sampling methods for computing the negative part of the gradient based on tempering were proposed and shown to improve and stabilize learning. h [1] h [1] h [1] h [2] h [ A single-layer DAE is a special form of multi-layer perception network with a single hidden layer and a tied set of weights [45] (see Figure 3 (b)). A DAE is a network that reconstructs a corrupted input vector as well as possible by minimizing the following cost function…”
Section: Restricted Boltzmann Machines and Denoising Autoencodersmentioning
confidence: 99%
“…This method, which consists of alternately adding noise to a sample and denoising it, yields competitive performance in terms of estimated log-likelihood of the samples. An important connection was also made by Vincent [27], who showed that optimising the training objective of a denoising autoencoder is equivalent to performing score matching [17] between the Parzen density estimator of the training data and a particular energy-based model. Composite denoising autoencoders learn a diverse representation by leveraging the observation that the types of features learnt by the standard denoising autoencoders differ depending on the level of noise.…”
Section: Introductionmentioning
confidence: 99%
“…It can be useful to allow the transfer function h for the decoder to be different from that for the encoder. Typically, W and W are constrained by W = W T , which has been justified theoretically by Vincent [27].…”
Section: Introductionmentioning
confidence: 99%
“…Expression of a data is studied or initial data is coded effectively by hidden layer. According to the study of literatures [19][20][21][22][23][24][25][26][27][28][29], Deep Learning can gain more representative characteristic information by training large scale data. Thus, the sample may be classified and estimated to improve precision of information.…”
Section: Introductionmentioning
confidence: 99%