Rényi divergence is related to Rényi entropy much like Kullback-Leibler divergence is related to Shannon's entropy, and comes up in many settings. It was introduced by Rényi as a measure of information that satisfies almost the same axioms as Kullback-Leibler divergence, and depends on a parameter that is called its order. In particular, the Rényi divergence of order 1 equals the Kullback-Leibler divergence.We review and extend the most important properties of Rényi divergence and Kullback-Leibler divergence, including convexity, continuity, limits of σ-algebras and the relation of the special order 0 to the Gaussian dichotomy and contiguity. We also show how to generalize the Pythagorean inequality to orders different from 1, and we extend the known equivalence between channel capacity and minimax redundancy to continuous channel inputs (for all orders) and present several other minimax results.
In its modern formulation, the Maximum Entropy Principle was promoted by E.T. Jaynes, starting in the mid-fifties. The principle dictates that one should look for a distribution, consistent with available information, which maximizes the entropy. However, this principle focuses only on distributions and it appears advantageous to bring information theoretical thinking more prominently into play by also focusing on the "observer" and on coding. This view was brought forward by the second named author in the late seventies and is the view we will follow-up on here. It leads to the consideration of a certain game, the Code Length Game and, via standard game theoretical thinking, to a principle of Game Theoretical Equilibrium. This principle is more basic than the Maximum Entropy Principle in the sense that the search for one type of optimal strategies in the Code Length Game translates directly into the search for distributions with maximum entropy. In the present paper we offer a self-contained and comprehensive treatment of fundamentals of both principles mentioned, based on a study of the Code Length Game. Though new concepts and results are presented, the reading should be instructional and accessible to a rather wide audience, at least if certain mathematical details are left aside at a rst reading. The most frequently studied instance of entropy maximization pertains to the Mean Energy Model which involves a moment constraint related to a given function, here taken to represent "energy". This type of application is very well known from the literature with hundreds of applications pertaining to several different elds and will also here serve as important illustration of the theory. But our approach reaches further, especially regarding the study of continuity properties of the entropy function, and this leads to new results which allow a discussion of models with so-called entropy loss. These results have tempted us to speculate over the development of natural languages. In fact, we are able to relate our theoretical findings to the empirically found Zipf's law which involves statistical aspects of words in a language. The apparent irregularity inherent in models with entropy loss turns out to imply desirable stability properties of languages
Let V and D denote, respectively, total variation and divergence. We study lower bounds of D with V fixed. The theoretically best (i.e. largest) lower bound determines a function L = L(V), Vajda's tight lower bound, cf. Vajda, [?]. The main result is an exact parametrization of L. This leads to Taylor polynomials which are lower bounds for L, and thereby extensions of the classical Pinsker inequality which has numerous applications, cf.Pinsker, [?] and followers.
Jensen-Shannon divergence ͑JD͒ is a symmetrized and smoothed version of the most important divergence measure of information theory, Kullback divergence. As opposed to Kullback divergence it determines in a very direct way a metric; indeed, it is the square of a metric. We consider a family of divergence measures ͑JD ␣ for ␣ Ͼ 0͒, the Jensen divergences of order ␣, which generalize JD as JD 1 = JD. Using a result of Schoenberg, we prove that JD ␣ is the square of a metric for ␣ ͑0,2͔, and that the resulting metric space of probability distributions can be isometrically embedded in a real Hilbert space. Quantum Jensen-Shannon divergence ͑QJD͒ is a symmetrized and smoothed version of quantum relative entropy and can be extended to a family of quantum Jensen divergences of order ␣ ͑QJD ␣ ͒. We strengthen results by Lamberti and co-workers by proving that for qubits and pure states, QJD ␣ 1/2 is a metric space which can be isometrically embedded in a real Hilbert space when ␣ ͑0,2͔. In analogy with Burbea and Rao's generalization of JD, we also define general QJD by associating a Jensen-type quantity to any weighted family of states. Appropriate interpretations of quantities introduced are discussed and bounds are derived in terms of the total variation and trace distance.
To any discrete probability distribution P we can associate its entropy H(P ) = − p i ln p i and its index of coincidence IC(P ) = p 2 i . The main result of the paper is the determination of the precise range of the map P (IC(P ), H(P )). The range looks much like that of the map P (P max , H(P )) where P max is the maximal point probability, cf. research from 1965 (Kovalevskij [18]) to 1994 (Feder and Merhav [7]). The earlier results, which actually focus on the probability of error 1 − P max rather than P max , can be conceived as limiting cases of results obtained by methods here presented. Ranges of maps as those indicated are called Information Diagrams.The main result gives rise to precise lower as well as upper bounds for the entropy function. Some of these bounds are essential for the exact solution of certain problems of universal coding and prediction for Bernoulli sources. Other applications concern Shannon theory (relations betweeen various measures of divergence), statistical decision theory and rate distortion theory.Two methods are developed. One is topological, another involves convex analysis and is based on a "lemma of replacement" which is of independent interest in relation to problems of optimization of mixed type (concave/convex optimization).
Two new information-theoretic methods are introduced for establishing Poisson approximation inequalities. First, using only elementary information-theoretic techniques it is shown that, when S n = n i=1 X i is the sum of the (possibly dependent) binary random variables X 1 , X 2 , . . . , X n , with E(X i ) = p i and E(S n ) = λ, thenwhere D(P Sn Po(λ)) is the relative entropy between the distribution of S n and the Poisson(λ) distribution. The first term in this bound measures the individual smallness of the X i and the second term measures their dependence. A general method is outlined for obtaining corresponding bounds when approximating the distribution of a sum of general discrete random variables by an infinitely divisible distribution.Second, in the particular case when the X i are independent, the following sharper bound is established,and it is also generalized to the case when the X i are general integer-valued random variables. Its proof is based on the derivation of a subadditivity property for a new discrete version of the Fisher information, and uses a recent logarithmic Sobolev inequality for the Poisson distribution.
We compare two f -divergences and prove that their joint range is the convex hull of the joint range for distributions supported on only two points. Some applications of this result are given.
scite is a Brooklyn-based startup that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2023 scite Inc. All rights reserved.
Made with 💙 for researchers