2021
DOI: 10.1007/978-3-030-67731-2_18
|View full text |Cite
|
Sign up to set email alerts
|

Novel Results on the Number of Runs of the Burrows-Wheeler-Transform

Abstract: The Burrows-Wheeler-Transform (BWT), a reversible string transformation, is one of the fundamental components of many current data structures in string processing. It is central in data compression, as well as in efficient query algorithms for sequence data, such as webpages, genomic and other biological sequences, or indeed any textual data. The BWT lends itself well to compression because its number of equal-letterruns (usually referred to as r) is often considerably lower than that of the original string; i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
6
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3

Relationship

3
5

Authors

Journals

citations
Cited by 12 publications
(7 citation statements)
references
References 23 publications
0
6
1
Order By: Relevance
“…We strengthen our upper bound results by presenting matching lower bounds on the worst-case sensitivity for all these major versions of the Lempel-Ziv 77 factorizations. This contrasts with the previously known related results such that the size z 78 of the Lempel-Ziv 78 factorization can increase by a factor of Ω(n 3/4 ) [Lagarde and Perifel, 2018], and the number r of runs in the Burrows-Wheeler transform can increase by a factor of Ω(log n) [Giuliani et al, 2021] when a character is prepended to an input string of length n. We also study the worst-case sensitivity of several grammar compression algorithms including Bisection, AVL-grammar, GCIS (Grammar Compression by Induced Sorting), and CDAWG (Compact Directed Acyclic Word Graph). Further, we extend the notion of the worst-case sensitivity to string repetitiveness measures such as the smallest string attractor size γ and the substring complexity δ, and present matching upper and lower bounds of the worst-case multiplicative sensitivity for γ and δ.…”
contrasting
confidence: 79%
See 2 more Smart Citations
“…We strengthen our upper bound results by presenting matching lower bounds on the worst-case sensitivity for all these major versions of the Lempel-Ziv 77 factorizations. This contrasts with the previously known related results such that the size z 78 of the Lempel-Ziv 78 factorization can increase by a factor of Ω(n 3/4 ) [Lagarde and Perifel, 2018], and the number r of runs in the Burrows-Wheeler transform can increase by a factor of Ω(log n) [Giuliani et al, 2021] when a character is prepended to an input string of length n. We also study the worst-case sensitivity of several grammar compression algorithms including Bisection, AVL-grammar, GCIS (Grammar Compression by Induced Sorting), and CDAWG (Compact Directed Acyclic Word Graph). Further, we extend the notion of the worst-case sensitivity to string repetitiveness measures such as the smallest string attractor size γ and the substring complexity δ, and present matching upper and lower bounds of the worst-case multiplicative sensitivity for γ and δ.…”
contrasting
confidence: 79%
“…The recent work by Giuliani et al [20], however, shows that the number r of runs in the BWT of a string of length n can grow by a multiplicative factor of Ω(log n) when a single character is prepended to the input string. The other work by Lagarde and Perifel [33] shows that the size of the dictionary of LZ78, which is equal to the number of factors in the respective LZ78 factorization, can grow by a multiplicative factor of Ω(n 1/4 ), again when a single character is prepended to the input string.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…It is important to note that the performance in terms of both space and time of text compressors and compressed indexing data structures applied on a text w can be evaluated by using r bwt (w) [11]. An upper bound on the number of clusters produced by BW T has been provided in [13], in particular, it has been proved that r bwt(w) = O(z(w) log 2 n) where z(w) is the number of phrases in the LZ77 factorization of w and n is the length of w. The ratio between r bwt (w) and the number of clusters in the BW T of the reverse of w has been studied in [12]. A recent comparative survey illustrating the properties of r bwt (w) and other repetitiveness measures can be found in [19].…”
Section: Introductionmentioning
confidence: 99%
“…Lyndon factors enjoy a rich class of algorithmic and stringology applications including: counting and finding the maximal repetitions (a.k.a. runs) in a string [2] and in a trie [8], constant-space pattern matching [3], comparison of the sizes of run-length Burrows-Wheeler Transform of a sting and its reverse [4], substring minimal suffix queries [1], the shortest common superstring problem [7], and grammar-compressed self-index (Lyndon-SLP) [9].…”
Section: Introductionmentioning
confidence: 99%

Counting Lyndon Subsequences

Hirakawa,
Nakashima,
Inenaga
et al. 2021
Preprint
Self Cite