Yuto Nakashima scite author profile

We give a new characterization of maximal repetitions (or runs) in strings based on Lyndon words. The characterization leads to a proof of what was known as the "runs" conjecture (Kolpakov & Kucherov (FOCS '99)), which states that the maximum number of runs ρ(n) in a string of length n is less than n. The proof is remarkably simple, considering the numerous endeavors to tackle this problem in the last 15 years, and significantly improves our understanding of how runs can occur in strings. In addition, we obtain an upper bound of 3n for the maximum sum of exponents σ(n) of runs in a string of length n, improving on the best known bound of 4.1n by Crochemore et al. (JDA 2012), as well as other improved bounds on related problems. The characterization also gives rise to a new, conceptually simple linear-time algorithm for computing all the runs in a string. A notable characteristic of our algorithm is that, unlike all existing linear-time algorithms, it does not utilize the Lempel-Ziv factorization of the string. We also establish a relationship between runs and nodes of the Lyndon tree, which gives a simple optimal solution to the 2-Period Query problem that was recently solved by Kociumaka et al. (SODA 2015). * A preliminary version of this paper has appeared in [1].

show abstract

A new characterization of maximal repetitions by Lyndon trees

Bannai¹,

Tomohiro²,

Inenaga³

et al. 2014

View full text Add to dashboard Cite

Grammar Index by Induced Suffix Sorting

Akagı

Köppl

Nakashima

et al. 2021

View full text Add to dashboard Cite

Algorithms and combinatorial properties on shortest unique palindromic substrings

Inoue

Nakashima

Mieno

et al. 2018

Journal of Discrete Algorithms

View full text Add to dashboard Cite

MR-RePair: Grammar Compression Based on Maximal Repeats

Furuya

Takagi

Nakashima

et al. 2019

View full text Add to dashboard Cite

We analyze the grammar generation algorithm of the RePair compression algorithm, and show the relation between a grammar generated by RePair and maximal repeats. We reveal that RePair replaces step by step the most frequent pairs within the corresponding most frequent maximal repeats. Then, we design a novel variant of RePair, called MR-RePair, which substitutes the most frequent maximal repeats at once instead of substituting the most frequent pairs consecutively. We implemented MR-RePair and compared the size of the grammar generated by MR-RePair to that by RePair on several text corpora. Our experiments show that MR-RePair generates more compact grammars than RePair does, especially for highly repetitive texts. IntroductionGrammar compression is a method of lossless data compression that reduces the size of a given text by constructing a small context free grammar that uniquely derives the text. While the problem of generating the smallest such grammar is NP-hard [6], several approximation techniques have been proposed. Among them, RePair [11] is known as an off-line method that achieves a high compression ratio in practice [7,9,20], despite its simple scheme. There have been many studies concerning RePair, such as extending it to an online algorithm [13], improving its practical working time or space [5,17], applications to other fields [7,12,18], and analyzing the generated grammar size theoretically [6,15,16].Recently, maximal repeats have been considered as a measure for estimating how repetitive a given string is: Belazzougui et al. [4] showed that the number of extensions of maximal repeats is an upper bound on the number of runs in the Burrows-Wheeler transform and the number of factors in the Lempel-Ziv parsing. Also, several index structures whose size is bounded by the number of extensions of maximal repeats have been proposed [2,3,19].In this paper, we analyze the properties of RePair with regard to its relationship to maximal repeats. As stated above, several works have studied RePair, but, to the best of our knowledge, none of them associate RePair with maximal repeats. Moreover, we propose a grammar compression algorithm, called MR-RePair, that focuses on the property of maximal repeats. Ahead of this work, several off-line grammar compression schemes focusing on (non-maximal) repeats have been proposed [1,10,14]. Very recently, Gańczorz and Jeż addressed to heuristically improve the compression ratio of RePair with regard to the grammar size [8]. However, none of these techniques use the properties of maximal repeats. We show that, under a specific condition, there is a theoretical guarantee that the size of the grammar generated by MR-RePair is smaller than or equal to that generated by RePair. We also confirmed the effectiveness of MR-RePair compared to RePair through computational experiments. Contributions: The primary contributions of this study are as follows. arXiv:1811.04596v2 [cs.DS] 18 Feb 2019 2. We design a novel variant of RePair called MR-RePair, which is based on substituting the ...

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yuto Nakashima

The “Runs” Theorem

A new characterization of maximal repetitions by Lyndon trees

Grammar Index by Induced Suffix Sorting

Algorithms and combinatorial properties on shortest unique palindromic substrings

MR-RePair: Grammar Compression Based on Maximal Repeats

Contact Info

Product

Resources

About