No abstract
Huffman coding finds a prefix code that minimizes mean codeword length for a given probability distribution over a finite number of items. Campbell generalized the Huffman problem to a family of problems in which the goal is to minimize not mean codeword length i pili but rather a generalized mean of the form ϕ −1 ( i piϕ(li)),where li denotes the length of the ith codeword, pi denotes the corresponding probability, and ϕ is a monotonically increasing cost function. Such generalized means -also known as quasiarithmetic or quasilinear means -have a number of diverse applications, including applications in queueing. Several quasiarithmetic-mean problems have novel simple redundancy bounds in terms of a generalized entropy. A related property involves the existence of optimal codes: For "well-behaved" cost functions, optimal codes always exist for (possibly infinite-alphabet) sources having finite generalized entropy. Solving finite instances of such problems is done by generalizing an algorithm for finding length-limited binary codes to a new algorithm for finding optimal binary codes for any quasiarithmetic mean with a convex cost function. This algorithm can be performed using quadratic time and linear space, and can be extended to other penalty functions, some of which are solvable with similar space and time complexity, and others of which are solvable with slightly greater complexity. This reduces the computational complexity of a problem involving minimum delay in a queue, allows combinations of previously considered problems to be optimized, and greatly expands the space of problems solvable in quadratic time and linear space. The algorithm can be extended for purposes such as breaking ties among possibly different optimal codes, as with bottom-merge Huffman coding. Index TermsOptimal prefix code, Huffman algorithm, generalized entropies, generalized means, quasiarithmetic means, queueing.
Abstract-Let P = {p(i)} be a measure of strictly positive probabilities on the set of nonnegative integers. Although the countable number of inputs prevents usage of the Huffman algorithm, there are nontrivial P for which known methods find a source code that is optimal in the sense of minimizing expected codeword length. For some applications, however, a source code should instead minimize one of a family of nonlinear objective functions, β-exponential means, those of the form log a P i p(i)a n(i) , where n(i) is the length of the ith codeword and a is a positive constant. Applications of such minimizations include a novel problem of maximizing the chance of message receipt in single-shot communications (a < 1) and a previously known problem of minimizing the chance of buffer overflow in a queueing system (a > 1). This paper introduces methods for finding codes optimal for such exponential means. One method applies to geometric distributions, while another applies to distributions with lighter tails. The latter algorithm is applied to Poisson distributions and both are extended to alphabetic codes, as well as to minimizing maximum pointwise redundancy. The aforementioned application of minimizing the chance of buffer overflow is also considered.
A framework with two scalar parameters is introduced for various problems of finding a prefix code minimizing a coding penalty function. The framework encompasses problems previously proposed by Huffman, Campbell, Nath, and Drmota and Szpankowski, shedding light on the relationships among these problems. In particular, Nath's range of problems can be seen as bridging the minimum average redundancy problem of Huffman with the minimum maximum pointwise redundancy problem of Drmota and Szpankowski. Using this framework, two linear-time Huffman-like algorithms are devised for the minimum maximum pointwise redundancy problem, the only one in the framework not previously solved with a Huffman-like algorithm. Both algorithms provide solutions common to this problem and a subrange of Nath's problems, the second algorithm being distinguished by its ability to find the minimum variance solution among all solutions common to the minimum maximum pointwise redundancy and Nath problems. Simple redundancy bounds are also presented.
A novel lossless source coding paradigm applies to problems of unreliable lossless channels with low bit rates, in which a vital message needs to be transmitted prior to termination of communications. This paradigm can be applied to Alfréd Rényi's secondhand account of an ancient siege in which a spy was sent to scout the enemy but was captured. After escaping, the spy returned to his base in no condition to speak and unable to write. His commander asked him questions that he could answer by nodding or shaking his head, and the fortress was defended with this information. Rényi told this story with reference to prefix coding, but maximizing probability of survival in the siege scenario is distinct from yet related to the traditional source coding objective of minimizing expected codeword length. Rather than finding a code minimizing expected codeword length P n i=1 p(i)l(i), the siege problem involves maximizing P n i=1 p(i)θ l(i) for a known θ ∈ (0, 1). When there are no restrictions on codewords, this problem can be solved using a known generalization of Huffman coding. The optimal solution has coding bounds which are functions of Rényi entropy; in addition to known bounds, new bounds are derived here. The alphabetically constrained version of this problem has applications in search trees and diagnostic testing. A novel dynamic programming algorithm -based upon the oldest known algorithm for the traditional alphabetic problem -optimizes this problem in O(n 3 ) time and O(n 2 ) space, whereas two novel approximation algorithms can find a suboptimal solution faster: one in linear time, the other in O(n log n). Coding bounds for the alphabetic version of this problem are also presented.
This paper presents new lower and upper bounds for the optimal compression of binary prefix codes in terms of the most probable input symbol, where compression efficiency is determined by the nonlinear codeword length objective of minimizing maximum pointwise redundancy. This objective relates to both universal modeling and Shannon coding, and these bounds are tight throughout the interval. The upper bounds also apply to a related objective, that of d th exponential redundancy.
Abstract-Huffman coding finds an optimal prefix code for a given probability mass function. Consider situations in which one wishes to find an optimal code with the restriction that all codewords have lengths that lie in a user-specified set of lengths (or, equivalently, no codewords have lengths that lie in a complementary set). This paper introduces a polynomial-time dynamic programming algorithm that finds optimal codes for this reserved-length prefix coding problem. This has applications to quickly encoding and decoding lossless codes. In addition, one modification of the approach solves any quasiarithmetic prefix coding problem, while another finds optimal codes restricted to the set of codes with g codeword lengths for user-specified g (e.g., g = 2).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.