In this work, we improve upon the stepwise analysis of noisy iterative learning algorithms initiated by Pensia, Jog, and Loh (2018) and recently extended by Bu, Zou, and Veeravalli (2019). Our main contributions are significantly improved mutual information bounds for Stochastic Gradient Langevin Dynamics via datadependent estimates. Our approach is based on the variational characterization of mutual information and the use of data-dependent priors that forecast the minibatch gradient based on a subset of the training samples. Our approach is broadly applicable within the information-theoretic framework of Russo and Zou (2015) and Xu and Raginsky (2017). Our bound can be tied to a measure of flatness of the empirical risk surface. As compared with other bounds that depend on the squared norms of gradients, empirical investigations show that the terms in our bounds are orders of magnitude smaller.
A common tool in the practice of Markov Chain Monte Carlo is to use approximating transition kernels to speed up computation when the true kernel is slow to evaluate. A relatively limited set of quantitative tools exist to determine whether the performance of such approximations will be well behaved and to assess the quality of approximation. We derive a set a tools for such analysis based on the Hilbert space generated by the stationary distribution we intend to sample, L 2 (π). The focus of our work is on determining whether the approximating kernel (i.e. perturbation) will preserve the geometric ergodicity of the chain, and whether the approximating stationary distribution will be close to the original stationary distribution. Our results directly generalise the results of [JMMD15] from the uniformly ergodic case to the geometrically ergodic case. We then apply our results to the class of 'Noisy MCMC' algorithms.
This paper concerns the control of text-guided generative models, where a user provides a natural language prompt and the model generates samples based on this input. Prompting is intuitive, general, and flexible. However, there are significant limitations: prompting can fail in surprising ways, and it is often unclear how to find a prompt that will elicit some desired target behavior. A core difficulty for developing methods to overcome these issues is that failures are know-it-when-you-see-it-it's hard to fix bugs if you can't state precisely what the model should have done! In this paper, we introduce a formalization of "what the user intended" in terms of latent concepts implicit to the data generating process that the model was trained on. This formalization allows us to identify some fundamental limitations of prompting. We then use the formalism to develop concept algebra to overcome these limitations. Concept algebra is a way of directly manipulating the concepts expressed in the output through algebraic operations on a suitably defined representation of input prompts. We give examples using concept algebra to overcome limitations of prompting, including concept transfer through arithmetic, and concept nullification through projection. Code available at https://github.com/zihao12/concept-algebra.
A common tool in the practice of Markov chain Monte Carlo (MCMC) is to use approximating transition kernels to speed up computation when the desired kernel is slow to evaluate or is intractable. A limited set of quantitative tools exists to assess the relative accuracy and efficiency of such approximations. We derive a set of tools for such analysis based on the Hilbert space generated by the stationary distribution we intend to sample, $L_2(\pi)$. Our results apply to approximations of reversible chains which are geometrically ergodic, as is typically the case for applications to MCMC. The focus of our work is on determining whether the approximating kernel will preserve the geometric ergodicity of the exact chain, and whether the approximating stationary distribution will be close to the original stationary distribution. For reversible chains, our results extend the results of Johndrow et al. (2015) from the uniformly ergodic case to the geometrically ergodic case, under some additional regularity conditions. We then apply our results to a number of approximate MCMC algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.