Three strongly sequential, lossless compression schemes, one with linearly growing per-letter computational complexity, and two with fixed per-letter complexity, are presented and analyzed for memoryless sources with abruptly changing statistics. The first method, which improves on Willems' weighting approach, asymptotically achieves a lower bound on the redundancy, and hence is optimal. The second scheme achieves redundancy of O (log N=N) when the transitions in the statistics are large, and O (log log N= log N) otherwise. The third approach always achieves redundancy of O (log N=N). Obviously, the two fixed complexity approaches can be easily combined to achieve the better redundancy between the two. Simulation results support the analytical bounds derived for all the coding schemes.
Universal compression of patterns of sequences generated by independently identically distributed (i.i.d.) sources with unknown, possibly large, alphabets is investigated. A pattern is a sequence of indices that contains all consecutive indices in increasing order of first occurrence.
Abstract-Three strongly sequential, lossless compression schemes, one with linearly growing per-letter computational complexity, and two with fixed per-letter complexity, are presented and analyzed for memoryless sources with abruptly changing statistics. The first method, which improves on Willems' weighting approach, asymptotically achieves a lower bound on the redundancy, and hence is optimal. The second scheme achieves redundancy of O (log N=N ) when the transitions in the statistics are large, and O (log log N= log N ) otherwise. The third approach always achieves redundancy of O ( log N=N ). Obviously, the two fixed complexity approaches can be easily combined to achieve the better redundancy between the two. Simulation results support the analytical bounds derived for all the coding schemes.
Abstract-Bounds on the entropy of patterns of sequences generated by independently identically distributed (i.i.d.) sources are derived. A pattern is a sequence of indices that contains all consecutive integer indices in increasing order of first occurrence. If the alphabet of a source that generated a sequence is unknown, the inevitable cost of coding the unknown alphabet symbols can be exploited to create the pattern of the sequence. This pattern can in turn be compressed by itself. The bounds derived here are functions of the i.i.d. source entropy, alphabet size, and letter probabilities. It is shown that for large alphabets, the pattern entropy must decrease from the i.i.d. one. The decrease is in many cases more significant than the universal coding redundancy bounds derived in prior works. The pattern entropy is confined between two bounds that depend on the arrangement of the letter probabilities in the probability space. For very large alphabets whose size may be greater than the coded pattern length, all low probability letters are packed into one symbol. The pattern entropy is upper and lower bounded in terms of the i.i.d. entropy of the new packed alphabet. Correction terms, which are usually negligible, are provided for both upper and lower bounds.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.