-We present a development of parts of rate-distortion theory and pattern-matching algorithms for lossy data compression, centered around a lossy version of the Asymptotic Equipartition Property (AEP). This treatment closely parallels the corresponding development in lossless compression, a point of view that was advanced in an important paper of Wyner and Ziv in 1989. In the lossless case we review how the AEP underlies the analysis of the Lempel-Ziv algorithm by viewing it as a random code and reducing it to the idealized Shannon code. This also provides information about the redundancy of the Lempel-Ziv algorithm and about the asymptotic behavior of several relevant quantities.In the lossy case we give various versions of the statement of the generalized AEP and we outline the general methodology of its proof via large deviations. Its relationship with Barron and Orey's generalized AEP is also discussed. The lossy AEP is applied to: (i) prove strengthened versions of Shannon's direct source coding theorem and universal coding theorems; (ii) characterize the performance of "mismatched" codebooks in lossy data compression; (iii) analyze the performance of pattern-matching algorithms for lossy compression (including Lempel-Ziv schemes); (iv) determine the first order asymptotics of waiting times (with distortion) between stationary processes; (v) characterize the best achievable rate of "weighted" codebooks as an optimal sphere-covering exponent. We then present a refinement to the lossy AEP and use it to: (i) prove second order (direct and converse) lossy source coding theorems, including universal coding theorems; (ii) characterize which sources are quantitatively easier to compress; (iii) determine the second order asymptotics of waiting times between stationary processes; (iv) determine the precise asymptotic behavior of longest match-lengths between stationary processes. Extensions to random fields are also given.
Abstract-We characterize the best achievable performance of lossy compression algorithms operating on arbitrary random sources, and with respect to general distortion measures. Direct and converse coding theorems are given for variable-rate codes operating at a fixed distortion level, emphasizing: a) nonasymptotic results, b) optimal or near-optimal redundancy bounds, and c) results with probability one. This development is based in part on the observation that there is a precise correspondence between compression algorithms and probability measures on the reproduction alphabet. This is analogous to the Kraft inequality in lossless data compression. In the case of stationary ergodic sources our results reduce to the classical coding theorems. As an application of these general results, we examine the performance of codes based on mixture codebooks for discrete memoryless sources. A mixture codebook (or Bayesian codebook) is a random codebook generated from a mixture over some class of reproduction distributions. We demonstrate the existence of universal mixture codebooks, and show that it is possible to universally encode memoryless sources with redundancy of approximately ( 2) log bits, where is the dimension of the simplex of probability distributions on the reproduction alphabet.
Abslmct -T h e problem of a p p r o x i m a t i n g t h e dist r i b u t i o n of a s u m S, = C:=,U, of n d i s c r e t e random variables Y, by a Poisson or a c o m p o u n d Pois-s o n distribution arises naturally in m a n y classical a n d c u r r e n t applications, such a statistical genetics, dynamical systems, t h e recurrence properties of Markov processes a n d reliability theory. Using informationtheoretic ideas a n d techniques, we derive a family of n e w h o u n d s for compound Poisson approximation. We take a n approach similar to t h a t of Kontoyiannis, HarremGs a n d Johnson (2003), a n d we generalize some of t h e i r Poisson approximation b o u n d s to t h e compound Poisson case. P a r t l y motivated b y these results, w e derive a n e w logarithmic Soholev inequality for t h e compound Poisson measure a n d use it to prove measure-concentration bounds for a l a r g e class of discrete distributions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.