Novel Results on the Number of Runs of the Burrows-Wheeler-Transform

Giuliani, Sara; Inenaga, Shunsuke; Lipták, Zsuzsanna; Prezza, Nicola; Sciortino, Marinella; Toffanello, Anna

doi:10.1007/978-3-030-67731-2_18

Cited by 12 publications

(7 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We strengthen our upper bound results by presenting matching lower bounds on the worst-case sensitivity for all these major versions of the Lempel-Ziv 77 factorizations. This contrasts with the previously known related results such that the size z 78 of the Lempel-Ziv 78 factorization can increase by a factor of Ω(n 3/4 ) [Lagarde and Perifel, 2018], and the number r of runs in the Burrows-Wheeler transform can increase by a factor of Ω(log n) [Giuliani et al, 2021] when a character is prepended to an input string of length n. We also study the worst-case sensitivity of several grammar compression algorithms including Bisection, AVL-grammar, GCIS (Grammar Compression by Induced Sorting), and CDAWG (Compact Directed Acyclic Word Graph). Further, we extend the notion of the worst-case sensitivity to string repetitiveness measures such as the smallest string attractor size γ and the substring complexity δ, and present matching upper and lower bounds of the worst-case multiplicative sensitivity for γ and δ.…”

contrasting

confidence: 79%

“…The recent work by Giuliani et al [20], however, shows that the number r of runs in the BWT of a string of length n can grow by a multiplicative factor of Ω(log n) when a single character is prepended to the input string. The other work by Lagarde and Perifel [33] shows that the size of the dictionary of LZ78, which is equal to the number of factors in the respective LZ78 factorization, can grow by a multiplicative factor of Ω(n 1/4 ), again when a single character is prepended to the input string.…”

Section: Introductionmentioning

confidence: 99%

“…All the results reported in this article and in the related work are summarized in Table 1. insertion -Ω(log n) [20] In addition to the afore-mentioned multiplicative sensitivity, we also present the worstcase additive sensitivity which is defined as max T ∈Σ n {C(T ) − C(T ) : ed(T, T ) = 1} for all the string compressors/repetitiveness measures C dealt in this paper. We remark that the additive sensitivity allows one to observe and evaluate more details in the changes of the output sizes, as summarized in Table 2.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Sensitivity of string compressors and repetitiveness measures

Akagı¹,

Funakoshi²,

Inenaga³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

The sensitivity of a string compression algorithm C asks how much the output size C(T ) for an input string T can increase when a single character edit operation is performed on T . This notion enables one to measure the robustness of compression algorithms in terms of errors and/or dynamic changes occurring in the input string.In this paper, we analyze the worst-case multiplicative sensitivity of string compression algorithms, which is defined by max T ∈Σ n {C(T )/C(T ) : ed(T, T ) = 1}, where ed(T, T ) denotes the edit distance between T and T . In particular, for the most common versions of the Lempel-Ziv 77 compressors, we prove that the worst-case multiplicative sensitivity is only a small constant (2 or 3, depending on the version of the Lempel-Ziv 77 and the edit operation type), i.e., the size z of the Lempel-Ziv 77 factorizations can be larger by only a small constant factor. We strengthen our upper bound results by presenting matching lower bounds on the worst-case sensitivity for all these major versions of the Lempel-Ziv 77 factorizations. This contrasts with the previously known related results such that the size z 78 of the Lempel-Ziv 78 factorization can increase by a factor of Ω(n 3/4 ) [Lagarde and Perifel, 2018], and the number r of runs in the Burrows-Wheeler transform can increase by a factor of Ω(log n) [Giuliani et al., 2021] when a character is prepended to an input string of length n. We also study the worst-case sensitivity of several grammar compression algorithms including Bisection, AVL-grammar, GCIS (Grammar Compression by Induced Sorting), and CDAWG (Compact Directed Acyclic Word Graph). Further, we extend the notion of the worst-case sensitivity to string repetitiveness measures such as the smallest string attractor size γ and the substring complexity δ, and present matching upper and lower bounds of the worst-case multiplicative sensitivity for γ and δ. We also exhibit the worst-case additive sensitivity max T ∈Σ n {C(T ) − C(T ) : ed(T, T ) = 1}, which allows one to observe more details in the changes of the output sizes.

show abstract

contrasting

confidence: 79%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Sensitivity of string compressors and repetitiveness measures

Akagı¹,

Funakoshi²,

Inenaga³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…It is important to note that the performance in terms of both space and time of text compressors and compressed indexing data structures applied on a text w can be evaluated by using r bwt (w) [11]. An upper bound on the number of clusters produced by BW T has been provided in [13], in particular, it has been proved that r bwt(w) = O(z(w) log 2 n) where z(w) is the number of phrases in the LZ77 factorization of w and n is the length of w. The ratio between r bwt (w) and the number of clusters in the BW T of the reverse of w has been studied in [12]. A recent comparative survey illustrating the properties of r bwt (w) and other repetitiveness measures can be found in [19].…”

Section: Introductionmentioning

confidence: 99%

Logarithmic equal-letter runs for BWT of purely morphic words

Frosini¹,

Mancini²,

Rinaldi³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

In this paper we study the number rbwt of equal-letter runs produced by the Burrows-Wheeler transform (BW T ) when it is applied to purely morphic finite words, which are words generated by iterating prolongable morphisms. Such a parameter rbwt is very significant since it provides a measure of the performances of the BW T , in terms of both compressibility and indexing. In particular, we prove that, when BW T is applied to any purely morphic finite word on a binary alphabet, rbwt is O(log n), where n is the length of the word. Moreover, we prove that rbwt is Θ(log n) for the binary words generated by a large class of prolongable binary morphisms. These bounds are proved by providing some new structural properties of the bispecial circular factors of such words.

show abstract

“…Lyndon factors enjoy a rich class of algorithmic and stringology applications including: counting and finding the maximal repetitions (a.k.a. runs) in a string [2] and in a trie [8], constant-space pattern matching [3], comparison of the sizes of run-length Burrows-Wheeler Transform of a sting and its reverse [4], substring minimal suffix queries [1], the shortest common superstring problem [7], and grammar-compressed self-index (Lyndon-SLP) [9].…”

Section: Introductionmentioning

confidence: 99%

Counting Lyndon Subsequences

Hirakawa,

Nakashima,

Inenaga

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Counting substrings/subsequences that preserve some property (e.g., palindromes, squares) is an important mathematical interest in stringology. Recently, Glen et al. studied the number of Lyndon factors in a string. A string w = uv is called a Lyndon word if it is the lexicographically smallest among all of its conjugates vu. In this paper, we consider a more general problem "counting Lyndon subsequences". We show (1) the maximum total number of Lyndon subsequences in a string, (2) the expected total number of Lyndon subsequences in a string, (3) the expected number of distinct Lyndon subsequences in a string.

show abstract

Novel Results on the Number of Runs of the Burrows-Wheeler-Transform

Cited by 12 publications

References 23 publications

Sensitivity of string compressors and repetitiveness measures

Sensitivity of string compressors and repetitiveness measures

Logarithmic equal-letter runs for BWT of purely morphic words

Counting Lyndon Subsequences

Contact Info

Product

Resources

About