A universal variable-to-fixed length source code based on Lawrence's algorithm

Tjalkens, T.J.; Willems, F.M.J.

doi:10.1109/18.119684

Cited by 20 publications

(21 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…If the uncertainty on the source distribution can be modeled by a class of distributions, it was shown by B. Fitingof in [83] and by L. Davisson in [84] that for some uncertainty classes there is no asymptotic loss of compression efficiency if we use a source code tuned to the "center of gravity" of the uncertainty set. Constructive methods for various restricted classes of sources (such as memoryless and Markov) have been proposed by R. Krichevsky and V. Trofimov [59] and by T. Tjalkens and F. Willems [85].…”

Section: E Universal Source Codingmentioning

confidence: 99%

Fifty years of Shannon theory

Verdú

1998

IEEE Trans. Inform. Theory

264

135

View full text Add to dashboard Cite

Abstract-A brief chronicle is given of the historical development of the central problems in the theory of fundamental limits of data compression and reliable communication.Index Terms-Channel capacity, data compression, entropy, history of Information Theory, reliable communication, source coding. C LAUDE Shannon's "A mathematical theory of communication" [1] published in July and October of 1948 is the Magna Carta of the information age. Shannon's discovery of the fundamental laws of data compression and transmission marks the birth of Information Theory. A unifying theory with profound intersections with Probability, Statistics, Computer Science, and other fields, Information Theory continues to set the stage for the development of communications, data storage and processing, and other information technologies.This overview paper gives a brief tour of some of the main achievements in Information Theory. It confines itself to those disciplines directly spawned from [1]-now commonly referred to as Shannon theory.Section I frames the revolutionary nature of "A mathematical theory of communication," in the context of the rudimentary understanding of the central problems of communication theory available at the time of its publication.Section II is devoted to lossless data compression: the amount of information present in a source and the algorithms developed to achieve the optimal compression efficiency predicted by the theory.Section III considers channel capacity: the rate at which reliable information can be transmitted through a noisy channel.Section IV gives an overview of lossy data compression: the fundamental tradeoff of information rate and reproduction fidelity.The paper concludes with a list of selected points of tangency of Information Theory with other fields. Publisher Item Identifier S 0018-9448(98)06315-9.• Frequency Modulation (Armstrong, 1936);• Pulse-Code Modulation (PCM) (Reeves, 1937(Reeves, -1939;• Vocoder (Dudley, 1939); • Spread Spectrum (1940's). In those systems we find some of the ingredients that would be key to the inception of information theory: a) the Morse code gave an efficient way to encode information taking into account the frequency of the symbols to be encoded; b) systems such as FM, PCM, and spread spectrum illustrated that transmitted bandwidth is just another degree of freedom available to the engineer in the quest for more reliable communication; c) PCM was the first digital communication system used to transmit analog continuous-time signals; d) at the expense of reduced fidelity, the bandwidth used by the Vocoder [2] was less than the message bandwidth.In 1924, H. Nyquist [3] argued that the transmission rate is proportional to the logarithm of the number of signal levels in a unit duration. Furthermore, he posed the question of how much improvement in telegraphy transmission rate could be achieved by replacing the Morse code by an "optimum" code. R. Hartley's 1928 paper [10] uses terms such as "rate of communication," "intersymbol interference," and "capacity of a s...

show abstract

Section: E Universal Source Codingmentioning

confidence: 99%

Fifty years of Shannon theory

Verdú

1998

IEEE Trans. Inform. Theory

264

135

View full text Add to dashboard Cite

show abstract

“…We further provide a converse, showing that this rate is optimal up to the third-order term. * [3,4,5,6]. Upper and lower bounds on the redundancy of a universal code for the class of all memoryless sources is derived in [3].…”

mentioning

confidence: 99%

Fundamental limits of universal variable-to-fixed length coding of parametric sources

Iri

Kosut

2017

2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton)

View full text Add to dashboard Cite

Universal variable-to-fixed (V-F) length coding of d-dimensional exponential family of distributions is considered. We propose an achievable scheme consisting of a dictionary, used to parse the source output stream, making use of the previously-introduced notion of quantized types. The quantized type class of a sequence is based on partitioning the space of minimal sufficient statistics into cuboids. Our proposed dictionary consists of sequences in the boundaries of transition from low to high quantized type class size. We derive the asymptotics of the ǫ-coding rate of our coding scheme for large enough dictionaries. In particular, we show that the third-order coding rate of our scheme is H d 2 log log M log M , where H is the entropy of the source and M is the dictionary size. We further provide a converse, showing that this rate is optimal up to the third-order term. * [3,4,5,6]. Upper and lower bounds on the redundancy of a universal code for the class of all memoryless sources is derived in [3]. Universal V-F length coding of the class of all binary memoryless sources is then considered in [4,5], where [5] provides an asymptotically average sense optimal 1 algorithm. Later, optimal redundancy for V-F length compression of the class of Markov sources is derived in [6]. Performance of V-F length codes and fixed-to-variable (F-V) length codes for compression of the class of Markov sources is compared in [7] and a dictionary construction that asymptotically achieves the optimal error exponent is proposed.All previous works consider model classes that include all distributions within a simplex. However, universal V-F length coding for more structured model classes has not been considered in the literature. Apart from extending the topological complexities, we further adopt more general metrics of performance. Delay-sensitive modern applications reflect new requirements on the performance of compression schemes. Therefore it is vital to characterize the overhead associated with operation in the non-asymptotic regime. Over the course of probing the non-asymptotics, incurring "errors" are inevitable. Therefore, we depart from classical average-case (redundancy) and worst case (regret) analysis to the modern probabilistic analysis, where the figure of merit in our setup is the ǫ-coding rate -the minimum rate such that the corresponding overflow probability is less than ǫ. Our goal is to analyze asymptotics of the ǫ-coding rate as the size of the dictionary increases. We provide an achievable scheme for compressing d-dimensional exponential family of distributions as the parametric model class. Moreover, we provide a converse result, showing that our proposed scheme is optimal up to the third-order ǫ-coding rate.In previous universal V-F length codes, one can define a notion of complexity for sequences. In [3,4,5,6], a sequence with high complexity has low probability under a certain composite or mixture source. While in [7], high complexity sequences have high scaled (by sequence length) empirical entropy. The diction...

show abstract

“…Savari [25] later published a non-asymptotic analysis of the Tunstall code for binary, memoryless sources with small entropies. Universal variable-to-fixed length codes were analyzed in [39,21,20,19,38,43]; however, we are unaware of analyses of the minimax redundancy for variableto-fixed and variable-to-variable length codes, and these problems remain open. Finally Tjalkens has studied constructions of Tunstall codes in [34] and [36] and Kieffer focused on the problem of binary sources in [17].…”

Section: Introductionmentioning

confidence: 99%

Tunstall Code, Khodak Variations, and Random Walks

Drmota

Reznik

Szpankowski

2010

IEEE Trans. Inform. Theory

View full text Add to dashboard Cite

A variable-to-fixed length encoder partitions the source string into variable-length phrases that belong to a given and fixed dictionary. Tunstall, and independently Khodak, designed variable-to-fixed length codes for memoryless sources that are optimal under certain constraints. In this paper, we study the Tunstall and Khodak codes using analytic information theory, i.e., the machinery from the analysis of algorithms literature. After proposing an algebraic characterization of the Tunstall and Khodak codes, we present new results on the variance and a central limit theorem for dictionary phrase lengths. This analysis also provides a new argument for obtaining asymptotic results about the mean dictionary phrase length and average redundancy rates.

show abstract

A universal variable-to-fixed length source code based on Lawrence's algorithm

Cited by 20 publications

References 5 publications

Fifty years of Shannon theory

Fifty years of Shannon theory

Fundamental limits of universal variable-to-fixed length coding of parametric sources

Tunstall Code, Khodak Variations, and Random Walks

Contact Info

Product

Resources

About