Consider the partial sums {S t } of a real-valued functional F (Φ(t)) of a Markov chain {Φ(t)} with values in a general state space. Assuming only that the Markov chain is geometrically ergodic and that the functional F is bounded, the following conclusions are obtained:Spectral theory: Well-behaved solutionsf can be constructed for the "multiplicative Poisson equation" (e αF P )f = λf , where P is the transition kernel of the Markov chain, and α ∈ C is a constant. The functionf is an eigenfunction, with corresponding eigenvalue λ, for the kernel (e αF P ) = e αF (x) P (x, dy).A "multiplicative" mean ergodic theorem: For all complex α in a neighborhood of the origin, the normalized mean of exp(αS t ) (and not the logarithm of the mean) converges tof exponentially fast, wheref is a solution of the multiplicative Poisson equation.Edgeworth Expansions: Rates are obtained for the convergence of the distribution function of the normalized partial sums S t to the standard Gaussian distribution. The first term in this expansion is of order (1/ √ t), and it depends on the initial condition of the Markov chain through the solution F of the associated Poisson equation (and not the solutionf of the multiplicative Poisson equation).Large Deviations: The partial sums are shown to satisfy a large deviations principle in a neighborhood of the mean. This result, proved under geometric ergodicity alone, cannot in general be extended to the whole real line.Exact Large Deviations Asymptotics: Rates of convergence are obtained for the large deviations estimates above. The polynomial pre-exponent is of order (1/ √ t), and its coefficient depends on the initial condition of the Markov chain through the solutionf of the multiplicative Poisson equation.Extensions of these results to continuous-time Markov processes are also given.
In this paper we continue the investigation of the spectral theory and exponential asymptotics of primarily discrete-time Markov processes, following Kontoyiannis and Meyn [32]. We introduce a new family of nonlinear Lyapunov drift criteria, which characterize distinct subclasses of geometrically ergodic Markov processes in terms of simple inequalities for the nonlinear generator. We concentrate primarily on the class of multiplicatively regular Markov processes, which are characterized via simple conditions similar to (but weaker than) those of Donsker-Varadhan. For any such process Φ = {Φ(t)} with transition kernel P on a general state space X, the following are obtained.Spectral Theory: For a large class of (possibly unbounded) functionals F : X → C, the kernel P (x, dy) = e F (x) P (x, dy) has a discrete spectrum in an appropriately defined Banach space. It follows that there exists a "maximal" solution (λ,f ) to the multiplicative Poisson equation, defined as the eigenvalue problem Pf = λf . The functional Λ(F ) = log(λ) is convex, smooth, and its convex dual Λ * is convex, with compact sublevel sets.Multiplicative Mean Ergodic Theorem: Consider the partial sums {S t } of the process with respect to any one of the functionals F (Φ(t)) considered above. The normalized mean E x [exp(S t )] (and not the logarithm of the mean) converges tof (x) exponentially fast, wheref is the above solution of the multiplicative Poisson equation.Multiplicative regularity: The Lyapunov drift criterion under which our results are derived is equivalent to the existence of regeneration times with finite exponential moments for the partial sums {S t }, with respect to any functional F in the above class.Large Deviations: The sequence of empirical measures of {Φ(t)} satisfies a large deviations principle in the "τ W0 -topology," a topology finer that the usual τ -topology, generated by the above class of functionals F on X which is strictly larger than L ∞ (X). The rate function of this LDP is Λ * , and it is shown to coincide with the Donsker-Varadhan rate function in terms of relative entropy.Exact Large Deviations Asymptotics: The above partial sums {S t } are shown to satisfy an exact large deviations expansion, analogous to that obtained by Bahadur and Ranga Rao for independent random variables.
We discuss a family of estimators for the entropy rate of a stationary ergodic process and prove their pointwise and mean consistency under a Doeblin-type mixing condition. The estimators are Cesàro averages of longest match-lengths, and their consistency follows from a generalized ergodic theorem due to Maker. We provide examples of their performance on English text, and we generalize our results to countable alphabet processes and to random fields.
Suppose P is an arbitrary discrete distribution on a countable alphabet .Given an i.i.d. sample X 1 X n drawn from P, we consider the problem of estimating the entropy H P or some other functional F = F P of the unknown distribution P. We show that, for additive functionals satisfying mild conditions (including the cases of the mean, the entropy, and mutual information), the plug-in estimates of F are universally consistent. We also prove that, without further assumptions, no rate-of-convergence results can be obtained for any sequence of estimators. In the case of entropy estimation, under a variety of different assumptions, we get rate-of-convergence results for the plug-in estimate and for a nonparametric estimator based on match-lengths. The behavior of the variance and the expected error of the plug-in estimate is shown to be in sharp contrast to the finite-alphabet case. A number of other important examples of functionals are also treated in some detail.
Abstract-This paper provides an extensive study of the behavior of the best achievable rate (and other related fundamental limits) in variable-length strictly lossless compression. In the nonasymptotic regime, the fundamental limits of fixed-to-variable lossless compression with and without prefix constraints are shown to be tightly coupled. Several precise, quantitative bounds are derived, connecting the distribution of the optimal code lengths to the source information spectrum, and an exact analysis of the best achievable rate for arbitrary sources is given. Fine asymptotic results are proved for arbitrary (not necessarily prefix) compressors on general mixing sources. Nonasymptotic, explicit Gaussian approximation bounds are established for the best achievable rate on Markov sources. The source dispersion and the source varentropy rate are defined and characterized. Together with the entropy rate, the varentropy rate serves to tightly approximate the fundamental nonasymptotic limits of fixed-to-variable compression for all but very small block lengths.Index Terms-Lossless data compression, fixed-to-variable source coding, fixed-to-fixed source coding, entropy, finite-block length fundamental limits, central limit theorem, Markov sources, varentropy, minimal coding variance, source dispersion. I. FUNDAMENTAL LIMITS A. Asymptotics: Entropy RateFor a random source X = {P X n }, assumed for simplicity to take values in a finite alphabet A, the minimum asymptotically achievable source coding rate (bits per source sample) is the entropy rate,where X n = (X 1 , X 2 , . . . , X n ) and the information (in bits) of a random variable Z with distribution P Z is defined as The foregoing asymptotic fundamental limit holds under the following settings: 1) Almost-lossless n-to-k fixed-length data compression: Provided that the source is stationary and ergodic and the encoding failure probability does not exceed 0 < < 1, the minimum achievable rate k n is given by (1) as n → ∞. This is a direct consequence of the ShannonMacMillan theorem [25]. Dropping the assumption of stationarity/ergodicity, the fundamental limit is the limsup in probability of the normalized informations [13]. 2) Strictly lossless variable-length prefix data compression: Provided that the limit in (1) exists (for which stationarity is sufficient) the minimal average source coding rate converges to (1). This is a consequence of the fact that for prefix codes the average encoded length cannot be smaller than the entropy [26], and the minimal average encoded length (achieved by the Huffman code), never exceeds the entropy plus one bit. If the limit in (1) does not exist, then the asymptotic minimal average source coding rate is simply the lim sup of the normalized entropies [13]. For stationary ergodic sources, the source coding rate achieved by any prefix code is asymptotically almost surely bounded below by the entropy rate as a result of Barron's lemma [2], a bound which is achieved by the Shannon code. 3) Strictly lossless variable-length data compress...
Partly motivated by entropy-estimation problems in neuroscience, we present a detailed and extensive comparison between some of the most popular and effective entropy estimation methods used in practice: The plug-in method, four different estimators based on the Lempel-Ziv (LZ) family of data compression algorithms, an estimator based on the Context-Tree Weighting (CTW) method, and the renewal entropy estimator. METHODOLOGY: Three new entropy estimators are introduced; two new LZ-based estimators, and the "renewal entropy estimator," which is tailored to data generated by a binary renewal process. For two of the four LZ-based estimators, a bootstrap procedure is described for evaluating their standard error, and a practical rule of thumb is heuristically derived for selecting the values of their parameters in practice. THEORY: We prove that, unlike their earlier versions, the two new LZ-based estimators are universally consistent, that is, they converge to the entropy rate for every finite-valued, stationary and ergodic process. An effective method is derived for the accurate approximation of the entropy rate of a finite-state hidden Markov model (HMM) with known distribution. Heuristic calculations are presented and approximate formulas are derived for evaluating the bias and the standard error of each estimator. SIMULATION: All estimators are applied to a wide range of data generated by numerous different processes with varying degrees of dependence and memory. The main conclusions drawn from these experiments include: (i) For all estimators considered, the main source of error is the bias. (ii) The CTW method is repeatedly and consistently seen to provide the most accurate results. (iii) The performance of the LZ-based estimators is often comparable to that of the plug-in method.(iv) The main drawback of the plug-in method is its computational inefficiency; with small word-lengths it fails to detect longer-range structure in the data, and with longer word-lengths the empirical distribution is severely undersampled, leading to large biases.
A general methodology is introduced for the construction and effective application of control variates to estimation problems involving data from reversible Markov chain Monte Carlo samplers. We propose the use of a specific class of functions as control variates, and we introduce a new consistent estimator for the values of the coefficients of the optimal linear combination of these functions. For a specific Markov chain Monte Carlo scenario, the form and proposed construction of the control variates is shown to provide an exact solution of the associated Poisson equation. This implies that the estimation variance in this case (in the central limit theorem regime) is exactly zero. The new estimator is derived from a novel, finite dimensional, explicit representation for the optimal coefficients. The resulting variance reduction methodology is primarily (though certainly not exclusively) applicable when the simulated data are generated by a random-scan Gibbs sampler. Markov chain Monte Carlo examples of Bayesian inference problems demonstrate that the corresponding reduction in the estimation variance is significant, and that in some cases it can be quite dramatic. Extensions of this methodology are discussed and simulation examples are presented illustrating the utility of the methods proposed. All methodological and asymptotic arguments are rigorously justified under essentially minimal conditions.
Abstract-Shannon's celebrated source coding theorem can be viewed as a "one-sided law of large numbers." We formulate second-order noiseless source coding theorems for the deviation of the codeword lengths from the entropy. For a class of sources that includes Markov chains we prove a "one-sided central limit theorem" and a law of the iterated logarithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.