T his article addresses the problem of realizing Ziv and Lempel's second textual substitution compressor [17] in hardware. We evaluate the combined effects of dictionary size and of various dictionary management mechanisms on the performance of a Ziv-Lempel encoder, then present a new dictionary manager which resolves the time-space trade-offs of its predecessors. For a good overview of both statistical and textual substitution data compressors see Leweler's and Hirschbergs's survey [7]. Fiala and Greene [5] provide extensive comparisons of the performance of most popular encoders on large sets of test files, in addition to presenting practical implementations of Ziv and Lempel's first textual substitution encoder [16]. Related chapters in Storer's text [11] discuss textual substitution schemes within a unified framework and contain extensive bibliographies. We introduce our work with an informal description of the Ziv-Lempel algorithm, as popularized by Welch [15], and the prohlem of maintaining a continuously adaptive finite dictionary in real time.
The Popular Ziv-Lempel Data
CompressorThe second data compression scheme of Ziv and Lempel repeatedly matches the input stream to words contained in a dictionary, and returns pointers to the locations in the dictionary of the longest matches. Initially the dictionary contains only the single character strings over the input alphabet. These initial dictionary elements are permanent to ensure lossless compression. After each match, the matched word concatenated with the next symbol of the remaining input stream is added to the dictionary. This process continues until the input stream is exhausted. The dictionary growth heuristic implied by the addition of the last parsed word concatenated with the first unmatched symbol causes the dictionary to contain every prefix of every word it holds. Thus, a trie is the natural data structure for real-time parsing. Typical implementations represent the trie as a table in which each entry consists of a pointer to a word's longest proper prefix (parent) and the word's last character.
Practical Implementation of a Ziv-Lempel compressorRealistic implementations have finite memory. Therefore dictionary growth is bounded. Since dictionaries that adapt continuously generally provide better compression, we need an efficient dictionary management scheme. Three different schemes appear in the literature. The first of these sacrifices compression for inexpensive memory management, and the others make the opposite trade-off.Ziv-Lempel encoders require two parameters: input alphabet size |2|, and dictionary size, D, where D must exceed |S| to realize any possible reduction. With all textual suhstitution schemes there is a risk of expanding random data. Expansion for Ziv-Lempel compressors is bounded by log D/log |2|. Since most computer applications compress byte-oriented data, we assume 2 = {0...255} in all schemes discussed. We examine the effects of varying the second parameter, D, in the section on Performance of Deletion Heuristics.The f...