2021 Data Compression Conference (DCC) 2021
DOI: 10.1109/dcc50243.2021.00016
|View full text |Cite
|
Sign up to set email alerts
|

A grammar compressor for collections of reads with applications to the construction of the BWT

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 10 publications
(8 citation statements)
references
References 24 publications
0
7
0
Order By: Relevance
“…There, we measure the time for count(P ) with |P | = 2 x for each x ∈ [8..15]. For each data point and each dataset T , we extract 2 12 random samples of equal length from T , perform the query for each sample, and measure the average time per character. 5 From Fig.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…There, we measure the time for count(P ) with |P | = 2 x for each x ∈ [8..15]. For each data point and each dataset T , we extract 2 12 random samples of equal length from T , perform the query for each sample, and measure the average time per character. 5 From Fig.…”
Section: Methodsmentioning
confidence: 99%
“…They applied a grammar compression merging frequent bigrams similar to Re-Pair [28], and empirically could improve the computation of the BWT as well as the reconstruction of the text from the BWT. With a similar target, Díaz-Domínguez and Navarro [12,13] computed the extended BWT [31], a BWT variant for multiple texts, from the GCIS grammar.…”
Section: Related Workmentioning
confidence: 99%
“…The experiments showed that the r -index outperforms all the other implemented indexes by orders of magnitude in space or in time to locate pattern occurrences on highly repetitive datasets. However, other experiments on more typical repetitiveness scenarios [23,5,6,1] showed that the space of the r -index degrades very quickly as repetitiveness decreases. For example, a grammar-based index (which can be of size g = O(z log(n/z))) is usually slower but significantly smaller [5], and an even slower Lempel-Ziv based index of size O(z) [15] is even smaller.…”
Section: Introductionmentioning
confidence: 94%
“…However, r degrades faster than z as repetitiveness drops: in an experiment on bacterial genomes in the same article, where n/r ≈ 100, the r -index space approaches 0.9 bps, 4 times that of the lz-index; r also approaches 4z. Experiments on other datasets show that the r -index tends to be considerably larger [23,5,6,1]. 1 Indeed, in some realistic cases n/r can be over 1,500, but in most cases it is well below: 40-160 on versioned software and document collections and fully assembled human chromosomes, 7.5-50 on virus and bacterial genomes (with r in the range 4z-7z), and 4-9 on sequencing reads; see Section 5.…”
Section: Introductionmentioning
confidence: 99%
“…Nunes et al [33] showed how to compute the suffix array and the longest-common-prefix array from GCIS during a decompression step restoring the original text. Recently, Díaz-Domínguez and Navarro [10] show how to compute the BWT directly from the GCIS grammar.…”
Section: Related Workmentioning
confidence: 99%