We introduce a notion called entropic independence for distributions µ defined on pure simplicial complexes, i.e., subsets of size k of a ground set of elements. Informally, we call a background measure µ entropically independent if for any (possibly randomly chosen) set S, the relative entropy of an element of S drawn uniformly at random carries at most O(1/k) fraction of the relative entropy of S, a constant multiple of its "share of entropy." Entropic independence is the natural analog of spectral independence, another recently established notion, if one replaces variance by entropy.In our main result, we show that µ is entropically independent exactly when a transformed version of the generating polynomial of µ can be upper bounded by its linear tangent, a property implied by concavity of the said transformation. We further show that this concavity is equivalent to spectral independence under arbitrary external fields, an assumption that also goes by the name of fractional log-concavity. Our result can be seen as a new tool to establish entropy contraction from the much simpler variance contraction inequalities. A key differentiating feature of our result is that we make no assumptions on marginals of µ or the degrees of the underlying graphical model when µ is based on one. We leverage our results to derive tight modified log-Sobolev inequalities for multi-step down-up walks on fractionally log-concave distributions. As our main application, we establish the tight mixing time of O(n log n) for Glauber dynamics on Ising models with interaction matrix of operator norm smaller than 1, improving upon the prior quadratic dependence on n.
Graphical models are a rich language for describing high-dimensional distributions in terms of their dependence structure. While there are algorithms with provable guarantees for learning undirected graphical models in a variety of settings, there has been much less progress in the important scenario when there are latent variables. Here we study Restricted Boltzmann Machines (or RBMs), which are a popular model with wide-ranging applications in dimensionality reduction, collaborative filtering, topic modeling, feature extraction and deep learning.The main message of our paper is a strong dichotomy in the feasibility of learning RBMs, depending on the nature of the interactions between variables: ferromagnetic models can be learned efficiently, while general models cannot. In particular, we give a simple greedy algorithm based on influence maximization to learn ferromagnetic RBMs with bounded degree. In fact, we learn a description of the distribution on the observed variables as a Markov Random Field. Our analysis is based on tools from mathematical physics that were developed to show the concavity of magnetization. Our algorithm extends straighforwardly to general ferromagnetic Ising models with latent variables.Conversely, we show that even for a contant number of latent variables with constant degree, without ferromagneticity the problem is as hard as sparse parity with noise. This hardness result is based on a sharp and surprising characterization of the representational power of bounded degree RBMs: the distribution on their observed variables can simulate any bounded order MRF. This result is of independent interest since RBMs are the building blocks of deep belief networks.Lemma 6.2. Suppose X i is the spin at vertex i in an (α, β)-nondegenerate Ising model and j is a neighbor of i. Then for any fixing x =i,j of the other spins X i =j of the Ising model, we haveSince tanh (x) = 1 − tanh 2 (x) and tanh is a monotone function, we see that if we let x = −J ij + k:k / ∈{i,j} J ik x k , then since x ∈ [−β, β] we have | tanh(x + 2J ij ) − tanh(x)| ≥ 2|J ij | inf x∈[−β,β](1 − tanh 2 (x)) ≥ 2α(1 − tanh 2 (β)) .
We consider interpolation learning in high-dimensional linear regression with Gaussian data, and prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class in terms of the class's Gaussian width. Applying the generic bound to Euclidean norm balls recovers the consistency result of for minimum-norm interpolators, and confirms a prediction of Zhou et al. ( 2020) for near-minimal-norm interpolators in the special case of Gaussian data. We demonstrate the generality of the bound by applying it to the simplex, obtaining a novel consistency result for minimum 1 -norm interpolators (basis pursuit). Our results show how norm-based generalization bounds can explain and be used to analyze benign overfitting, at least in some settings. * These authors contributed equally. 1 Negrea et al. (2020) argue that Bartlett et al. (2020)'s proof technique is fundamentally based on uniform convergence of a surrogate predictor; Yang et al. ( 2021) study a closely related setting with a uniform convergence-type argument, but do not establish consistency. We discuss both papers in more detail in Section 4.
Sparse linear regression is a central problem in high-dimensional statistics. We study the correlated random design setting, where the covariates are drawn from a multivariate Gaussian N (0, Σ), and we seek an estimator with small excess risk.If the true signal is t-sparse, information-theoretically, it is possible to achieve strong recovery guarantees with only O(t log n) samples. However, computationally efficient algorithms have sample complexity linear in (some variant of) the condition number of Σ. Classical algorithms such as the Lasso can require significantly more samples than necessary even if there is only a single sparse approximate dependency among the covariates.We provide a polynomial-time algorithm that, given Σ, automatically adapts the Lasso to tolerate a small number of approximate dependencies. In particular, we achieve near-optimal sample complexity for constant sparsity and if Σ has few "outlier" eigenvalues. Our algorithm fits into a broader framework of feature adaptation for sparse linear regression with ill-conditioned covariates. With this framework, we additionally provide the first polynomial-factor improvement over brute-force search for constant sparsity t and arbitrary covariance Σ.
We prove that Ising models on the hypercube with general quadratic interactions satisfy a Poincaré inequality with respect to the natural Dirichlet form corresponding to Glauber dynamics, as soon as the operator norm of the interaction matrix is smaller than 1. The inequality implies a control on the mixing time of the Glauber dynamics. Our techniques rely on a localization procedure which establishes a structural result, stating that Ising measures may be decomposed into a mixture of measures with quadratic potentials of rank one, and provides a framework for proving concentration bounds for high temperature Ising models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.