2020
DOI: 10.3390/a13040103
|View full text |Cite
|
Sign up to set email alerts
|

Practical Grammar Compression Based on Maximal Repeats

Abstract: This study presents an analysis of RePair, which is a grammar compression algorithm known for its simple scheme, while also being practically effective. First, we show that the main process of RePair, that is, the step by step substitution of the most frequent symbol pairs, works within the corresponding most frequent maximal repeats. Then, we reveal the relation between maximal repeats and grammars constructed by RePair. On the basis of this analysis, we further propose a novel variant of RePair, called MR-Re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
2
1
1

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 28 publications
1
5
0
Order By: Relevance
“…We first prove that the size of the smallest grammar of the n-th Fibonacci word F n is n. We then prove that applying any implementation of RePair to F n always provides a smallest grammar of F n , and conversely, only such grammars can be the smallest for Fibonacci words. This was partially observed earlier in the experiments by Furuya et al [14], where five different implementations of RePair produced grammars of the same size for the fib41 string from the Repetitive Corpus of the Pizza&Chili Corpus (http://pizzachili.dcc.uchile.cl/repcorpus.html). However, to our knowledge, this paper is the first that gives theoretical evidence.…”
Section: Introductionsupporting
confidence: 62%
See 2 more Smart Citations
“…We first prove that the size of the smallest grammar of the n-th Fibonacci word F n is n. We then prove that applying any implementation of RePair to F n always provides a smallest grammar of F n , and conversely, only such grammars can be the smallest for Fibonacci words. This was partially observed earlier in the experiments by Furuya et al [14], where five different implementations of RePair produced grammars of the same size for the fib41 string from the Repetitive Corpus of the Pizza&Chili Corpus (http://pizzachili.dcc.uchile.cl/repcorpus.html). However, to our knowledge, this paper is the first that gives theoretical evidence.…”
Section: Introductionsupporting
confidence: 62%
“…The cases (7)(8)(9)(10)(11)(12)(13)(14)(15)(16) for P n and Q n can be proven similarly as the above cases (1-6) for F n . We briefly explain the ideas of our proofs below, but the complete proof for cases (7)(8)(9)(10)(11)(12)(13)(14)(15)(16) will appear in the full version of this paper. We again utilize LZ-factorizations as in the proofs for F n in Section 5 to show the non-optimality of strategies for P n and Q n .…”
Section: Non-optimality Of Strategies For P N and Q Nmentioning
confidence: 66%
See 1 more Smart Citation
“…We can adapt our algorithm to compute the MR-Re-Pair grammar scheme proposed by Furuya et al [18]. The difference to Re-Pair is that MR-Re-Pair replaces the most frequent maximal repeat instead of the most frequent bigram, where a maximal repeat is a reoccurring substring of the text whose frequency decreases when extending it to the left or to the right.…”
Section: Computing Mr-re-pair In Small Spacementioning
confidence: 99%
“…Ganczorz and Jez [16] modified the Re-Pair grammar by disfavoring the replacement of bigrams that cross Lempel-Ziv-77 (LZ77) [17] factorization borders, which allowed the authors to achieve practically smaller grammar sizes. Recently, Furuya et al [18] presented a variant, called MR-Re-Pair, in which a most frequent maximal repeat is replaced instead of a most frequent bigram.…”
Section: Introductionmentioning
confidence: 99%