2008
DOI: 10.1109/dcc.2008.79
|View full text |Cite
|
Sign up to set email alerts
|

Re-pair Achieves High-Order Entropy

Abstract: Re-Pair is a dictionary-based compression method invented in 1999 by Larsson and Moffat. Although its practical performance has been established through experiments, the method has resisted all attempts of formal analysis. In this paper we show that Re-Pair compresses a sequence T [1, n] over an alphabet of size σ and k-th order entropy H k , to at most 2H k + o(n log σ) bits, for any k = o(log σ n).

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
18
0

Year Published

2010
2010
2021
2021

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 21 publications
(19 citation statements)
references
References 15 publications
(23 reference statements)
1
18
0
Order By: Relevance
“…mr |, 1 2 g rp < g mr ≤ g rp follows Equations (11) and (12), and thus, the proposition holds. g mr = g rp holds when every length l…”
Section: Definition 3 (Mr-repair)mentioning
confidence: 67%
See 1 more Smart Citation
“…mr |, 1 2 g rp < g mr ≤ g rp follows Equations (11) and (12), and thus, the proposition holds. g mr = g rp holds when every length l…”
Section: Definition 3 (Mr-repair)mentioning
confidence: 67%
“…Despite its simple scheme, RePair is known for its high compression in practice [3][4][5], and hence, it has been comprehensively studied. Some examples of studies on the RePair algorithm include its extension to an online algorithm [6], practical working time/space improvements [7,8], applications to various fields [3,9,10], and theoretical analysis of generated grammar sizes [1,11,12].…”
Section: Introductionmentioning
confidence: 99%
“…(It is sometimes desirable for the CFG to be in Chomsky normal form (CNF), in which case it is also known as a straight-line program.) We can measure our success in terms of universality [5], empirical entropy [6] or the ratio between the size of our CFG and the size g = Ω(log n) of the smallest such grammar. In this paper we consider the third and last measure.…”
Section: Introductionmentioning
confidence: 99%
“…The Re-Pair algorithm is an algorithm with O(n) time complexity; it is easy to implement using linked lists and a priority queue. Further, it was shown in [18] that it can compress an input message of length n over an alphabet of size |Σ| into at most 2H k + o(n log |Σ|) bits, where H k is k-th order entropy.…”
Section: Previous Workmentioning
confidence: 99%