Truncated variational EM for semi-supervised neural simpletrons

Forster, Dennis; Lücke, Jörg

doi:10.1109/ijcnn.2017.7966331

Cited by 8 publications

(12 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Previous work based on a fully probabilistic description of the Hebbian-learning network model (Forster et al, 2016 ; Forster and Lücke, 2017 ) shows that local Hebbian learning converges to the weight matrix B without requiring the non-local summation over k . This is true also when using a small fraction (≈1%) of labeled training examples.…”

Section: Methodsmentioning

confidence: 99%

Models of Acetylcholine and Dopamine Signals Differentially Improve Neural Representations

Holca-Lamarre

Lücke

Obermayer

2017

Front. Comput. Neurosci.

View full text Add to dashboard Cite

Biological and artificial neural networks (ANNs) represent input signals as patterns of neural activity. In biology, neuromodulators can trigger important reorganizations of these neural representations. For instance, pairing a stimulus with the release of either acetylcholine (ACh) or dopamine (DA) evokes long lasting increases in the responses of neurons to the paired stimulus. The functional roles of ACh and DA in rearranging representations remain largely unknown. Here, we address this question using a Hebbian-learning neural network model. Our aim is both to gain a functional understanding of ACh and DA transmission in shaping biological representations and to explore neuromodulator-inspired learning rules for ANNs. We model the effects of ACh and DA on synaptic plasticity and confirm that stimuli coinciding with greater neuromodulator activation are over represented in the network. We then simulate the physiological release schedules of ACh and DA. We measure the impact of neuromodulator release on the network's representation and on its performance on a classification task. We find that ACh and DA trigger distinct changes in neural representations that both improve performance. The putative ACh signal redistributes neural preferences so that more neurons encode stimulus classes that are challenging for the network. The putative DA signal adapts synaptic weights so that they better match the classes of the task at hand. Our model thus offers a functional explanation for the effects of ACh and DA on cortical representations. Additionally, our learning algorithm yields performances comparable to those of state-of-the-art optimisation methods in multi-layer perceptrons while requiring weaker supervision signals and interacting with synaptically-local weight updates.

show abstract

Section: Methodsmentioning

confidence: 99%

Models of Acetylcholine and Dopamine Signals Differentially Improve Neural Representations

Holca-Lamarre

Lücke

Obermayer

2017

Front. Comput. Neurosci.

View full text Add to dashboard Cite

show abstract

“…For all the above applications, our theoretical results show that the free energy (31) is the underlying objective function which is maximized. For the algorithms (Hughes and Sudderth, 2016;Forster and Lücke, 2017b) the TV-EM application to mixture models, furthermore, warrants that the free energy is provably monotonically increased, which follows from Prop. 5 and has not been shown previously.…”

Section: Tv-em For Mixture Modelsmentioning

confidence: 96%

“…The main motivation and focus of the previous truncated approximations for mixture models (Hughes and Sudderth, 2016;Forster and Lücke, 2017b) was the increase of efficiency. The source for the reduction of computational efforts were hereby the hard zeros introduced by truncated posteriors, which significantly reduced the required number of numerical operations in the M-step.…”

Section: Tv-em For Mixture Modelsmentioning

confidence: 99%

Truncated Variational Expectation Maximization

Lücke¹

2016

Preprint

View full text Add to dashboard Cite

We derive a novel variational expectation maximization approach based on truncated variational distributions. Truncated distributions are proportional to exact posteriors within a subset of a discrete state space and equal zero otherwise. The novel variational approach is realized by first generalizing the standard variational EM framework to include variational distributions with exact ('hard') zeros. A fully variational treatment of truncated distributions then allows for deriving novel and mathematically grounded results, which in turn can be used to formulate novel efficient algorithms to optimize the parameters of probabilistic generative models. We find the free energies which correspond to truncated distributions to be given by concise and efficiently computable expressions, while update equations for model parameters (M-steps) remain in their standard form. Furthermore, we obtain generic expressions for expectation values w.r.t. truncated distributions. Based on these observations, we show how efficient and easily applicable meta-algorithms can be formulated that guarantee a monotonic increase of the free energy. Example applications of the here derived framework provide novel theoretical results and learning procedures for latent variable models as well as mixture models including procedures to tightly couple sampling and variational optimization approaches. Furthermore, by considering a special case of truncated variational distributions, we can cleanly and fully embed the well-known 'hard EM' approaches into the variational EM framework, and we show that 'hard EM' (for models with discrete latents) provably optimizes a lower free energy bound of the data log-likelihood.

show abstract

“…For GMM clustering with isotropic clusters, this means disregarding clusters distant from a given data point [48]. Such neglection ideas have, also more generally, been observed to reduce computational demands for probabilistic clustering approaches [25], [27], [49], [50], [51] as well as for deterministic approaches, such as k-means or agglomerative clustering, e.g., [6], [7]. For k-means, e.g.…”

Section: Related Workmentioning

confidence: 99%

A Variational EM Acceleration for Efficient Clustering at Very Large Scales

Hirschberger

Forster

Lücke

2022

IEEE Trans. Pattern Anal. Mach. Intell.

Self Cite

View full text Add to dashboard Cite

How can we efficiently find very large numbers of clusters C in very large datasets N of potentially high dimensionality D? Here we address the question by using a novel variational approach to optimize Gaussian mixture models (GMMs) with diagonal covariance matrices. The variational method approximates expectation maximization (EM) by applying truncated posteriors as variational distributions and partial E-steps in combination with coresets. Run time complexity to optimize the clustering objective then reduces from O(NCD) per conventional EM iteration to O(N G 2 D) for a variational EM iteration on coresets (with coreset size N ≤ N and truncation parameter G C). Based on the strongly reduced run time complexity per iteration, which scales sublinearly with NC, we then provide a concrete, practically applicable, parallelized and highly efficient clustering algorithm. In numerical experiments on standard large-scale benchmarks we (A) show that also overall clustering times scale sublinearly with NC, and (B) observe substantial wall-clock speedups compared to already highly efficient recently reported results. The algorithm's sublinear scaling allows for applications at scales where alternative methods cease to be applicable. We demonstrate such very large-scale applicability using the YFCC100M benchmark, for which we realize with a GMM of up to 50.000 clusters an optimization of a data density model with up to 150 M parameters.

show abstract

Truncated variational EM for semi-supervised neural simpletrons

Cited by 8 publications

References 21 publications

Models of Acetylcholine and Dopamine Signals Differentially Improve Neural Representations

Models of Acetylcholine and Dopamine Signals Differentially Improve Neural Representations

Truncated Variational Expectation Maximization

A Variational EM Acceleration for Efficient Clustering at Very Large Scales

Contact Info

Product

Resources

About