2020
DOI: 10.1007/978-3-030-61792-9_17
|View full text |Cite
|
Sign up to set email alerts
|

Towards a Definitive Measure of Repetitiveness

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
76
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 40 publications
(77 citation statements)
references
References 37 publications
1
76
0
Order By: Relevance
“…Examples of those measures include (but are not limited to) the number z of factors in the LZ77 factorization [21], the number g of rules in the smallest context-free grammar generating the word [17], the size b of the smallest bidirectional macro scheme [26], and the size e of the CDAWG [4]. More recently, it was shown that all those compressors are particular cases of a combinatorial object named string attractor [16] whose size γ lower-bounds all measures r, z, g, b, and e. In turn, in [19] it was shown that γ is lower-bounded by another measure, δ, which is linked to factor complexity (that is, to the number of distinct factors of each length) and better captures the word's repetitiveness. On the upper-bound side, the papers [16,19] provided approximation ratios of all measures but r with respect to γ.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Examples of those measures include (but are not limited to) the number z of factors in the LZ77 factorization [21], the number g of rules in the smallest context-free grammar generating the word [17], the size b of the smallest bidirectional macro scheme [26], and the size e of the CDAWG [4]. More recently, it was shown that all those compressors are particular cases of a combinatorial object named string attractor [16] whose size γ lower-bounds all measures r, z, g, b, and e. In turn, in [19] it was shown that γ is lower-bounded by another measure, δ, which is linked to factor complexity (that is, to the number of distinct factors of each length) and better captures the word's repetitiveness. On the upper-bound side, the papers [16,19] provided approximation ratios of all measures but r with respect to γ.…”
Section: Introductionmentioning
confidence: 99%
“…More recently, it was shown that all those compressors are particular cases of a combinatorial object named string attractor [16] whose size γ lower-bounds all measures r, z, g, b, and e. In turn, in [19] it was shown that γ is lower-bounded by another measure, δ, which is linked to factor complexity (that is, to the number of distinct factors of each length) and better captures the word's repetitiveness. On the upper-bound side, the papers [16,19] provided approximation ratios of all measures but r with respect to γ. Finding an upper-bound for r remained an open problem until the recent work of Kempa and Kociumaka [15], who showed that, for any word of length n, r = O(δ log 2 n) (which in turn implies r = O(γ log 2 n)).…”
Section: Introductionmentioning
confidence: 99%
“…For example, there are indexes based on LZ77 [37], RLBWT [17], and grammarbased compression [11]. Although recent studies [33,36,45] have investigated the fundamentals of these techniques and obtained a unified view of the compressibility of highly repetitive data, each compressed format still has pros and cons that cannot be ignored in practice. LZ77 usually achieves better compression than other compression methods, the index based on RLBWT (called r -index) supports very fast pattern search, and grammar-based compression is easy to handle in both theory and practice.…”
Section: Restructuring Compressed Datamentioning
confidence: 99%
“…Both new measures better capture the compressibility of repetitive strings. It has been proved that δ ≤ γ ≤ z = O(δ lg n δ ) [7,8]. In this paper, we design the first string attractor based indexes (, which is also workable upon LZ-parsing) to support computation of the matching statistics with space cost measured by γ and δ.…”
Section: Introductionmentioning
confidence: 99%
“…To access the text T [1..n] within compressed space, we apply the string indexing data structure by Kociumaka at al. [8] with space cost measured by δ. We give a simple and practical algorithm that reduces the problem of computing MS into O(m 2 ) times of 2D orthogonal range predecessor queries upon γ points on the grid.…”
Section: Introductionmentioning
confidence: 99%