Entropy and information provide natural measures of correlation among elements in a network. We construct here the information theoretic analog of connected correlation functions: irreducible N -point correlation is measured by a decrease in entropy for the joint distribution of N variables relative to the maximum entropy allowed by all the observed N − 1 variable distributions. We calculate the "connected information" terms for several examples, and show that it also enables the decomposition of the information that is carried by a population of elements about an outside source. Keywords: entropy, information, multi-information, redundancy, synergy, correlation, network In statistical physics and field theory, the nature of order in a system is characterized by correlation functions. These ideas are especially powerful because there is a direct relation between the correlation functions and experimental observables such as scattering cross sections and susceptibilities. As we move toward the analysis of more complex systems, such as the interactions among genes or neurons in a network, it is not obvious how to construct correlation functions which capture the underlying order. On the other hand it is possible to observe directly the activity of many single neurons in a network or the expression levels of many genes, and hence real experiments in these systems are more like Monte Carlo simulations, sampling the distribution of network states.Shannon proved that, given a probability distribution over a set of variables, entropy is the unique measure of what can be learned by observing these variables, given certain simple and plausible criteria (continuity, monotonicity and additivity) [1]. By the same arguments, mutual information arises as the unique measure of the interdependence of two variables, or two sets of variables. Defining information theoretic analogs of higher order correlations has proved to be more difficult [2,3,4,5,6,7,8,9, 10]. When we compute N -point correlation functions in statistical physics and field theory, we are careful to isolate the connected correlations, which are the components of the N -point correlation that cannot be factored into correlations among groups of fewer than N observables. We propose here an analogous measure of "connected information" which generalizes precisely our intuition about connectedness and interactions from field theory; a closely related discussion for quantum information has been given recently [11].Consider N variables {x i }, i = 1, 2, ..., N , drawn from the joint probability distribution P ({x i }); this has an entropy [12]. S({x(The fact that N variables are correlated means that the entropy S({x i }) is smaller than the sum of the entropies for each variable individually,The total difference in entropy between the interacting variables and the variables taken independently can be written as [2,3]
A system responding to a stochastic driving signal can be interpreted as computing, by means of its dynamics, an implicit model of the environmental variables. The system's state retains information about past environmental fluctuations, and a fraction of this information is predictive of future ones. The remaining nonpredictive information reflects model complexity that does not improve predictive power, and thus represents the ineffectiveness of the model. We expose the fundamental equivalence between this model inefficiency and thermodynamic inefficiency, measured by the energy dissipated during the interaction between system and environment. Our results hold arbitrarily far from thermodynamic equilibrium and are applicable to a wide range of systems, including biomolecular machines. They highlight a profound connection between the effective use of information and efficient thermodynamic operation: any system constructed to keep memory about its environment and to operate with maximal energetic efficiency has to be predictive.
We provide a fresh look at the problem of exploration in reinforcement learning, drawing on ideas from information theory. First, we show that Boltzmann-style exploration, one of the main exploration methods used in reinforcement learning, is optimal from an information-theoretic point of view, in that it optimally trades expected return for the coding cost of the policy. Second, we address the problem of curiosity-driven learning. We propose that, in addition to maximizing the expected return, a learner should choose a policy that also maximizes the learner's predictive power. This makes the world both interesting and exploitable. Optimal policies then have the form of Boltzmann-style exploration with a bonus, containing a novel exploration-exploitation trade-off which emerges naturally from the proposed optimization principle. Importantly, this exploration-exploitation trade-off persists in the optimal deterministic policy, i.e., when there is no exploration due to randomness. As a result, exploration is understood as an emerging behavior that optimizes information gain, rather than being modeled as pure randomization of action choices.
22Extreme volcanism on Io results from tidal heating, but its tidal dissipation mechanisms and
We introduce an approach to inferring the causal architecture of stochastic dynamical systems that extends rate distortion theory to use causal shielding-a natural principle of learning. We study two distinct cases of causal inference: optimal causal filtering and optimal causal estimation.Filtering corresponds to the ideal case in which the probability distribution of measurement sequences is known, giving a principled method to approximate a system's causal structure at a desired level of representation. We show that, in the limit in which a model complexity constraint is relaxed, filtering finds the exact causal architecture of a stochastic dynamical system, known as the causal-state partition. From this, one can estimate the amount of historical information the process stores. More generally, causal filtering finds a graded model-complexity hierarchy of approximations to the causal architecture. Abrupt changes in the hierarchy, as a function of approximation, capture distinct scales of structural organization.For nonideal cases with finite data, we show how the correct number of underlying causal states can be found by optimal causal estimation. A previously derived model complexity control term allows us to correct for the effect of statistical fluctuations in probability estimates and thereby avoid over-fitting. Natural systems compute intrinsically and produce information. This organization, often only indirectly accessible to an observer, is reflected to varying degrees in measured time series. Nonetheless, this information can be used to build models of varying complexity that capture the causal architecture of the underlying system and allow one to estimate its information processing capabilities. We investigate two cases. The first is when a model builder wishes to find a more compact representation than the true one. This occurs, for example, when one is willing to incur the cost of a small increase in error for a large reduction in model size. The second case concerns the empirical setting in which only a finite amount of data is available. There one wishes to avoid over-fitting a model to a particular data set.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.