Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms 2018
DOI: 10.1137/1.9781611975031.96
|View full text |Cite
|
Sign up to set email alerts
|

Optimal-Time Text Indexing in BWT-runs Bounded Space

Abstract: Indexing highly repetitive texts -such as genomic databases, software repositories and versioned text collections -has become an important problem since the turn of the millennium. A relevant compressibility measure for repetitive texts is r, the number of runs in their Burrows-Wheeler Transform (BWT). One of the earliest indexes for repetitive collections, the Run-Length FMindex, used O(r) space and was able to efficiently count the number of occurrences of a pattern of length m in the text (in loglogarithmic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

2
127
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2
1
1

Relationship

5
4

Authors

Journals

citations
Cited by 81 publications
(129 citation statements)
references
References 79 publications
2
127
0
Order By: Relevance
“…Currently, SA-scan uses n log n + n log σ bits of space for a text of length n on alphabet σ for the suffix array and text, respectively (the WT approach of Bader et al, uses slightly more). Because our methods consist (mostly) of simple scans of SA ranges or scans of the underlying text, they are easily translated to make use of recent results on Burrows-Wheeler-based compressed indexes [10] that allow fast access to elements of the suffix array from a compressed representation of it. Via this observation we derive the first compressed indexes for VLG matching.…”
Section: Discussionmentioning
confidence: 99%
“…Currently, SA-scan uses n log n + n log σ bits of space for a text of length n on alphabet σ for the suffix array and text, respectively (the WT approach of Bader et al, uses slightly more). Because our methods consist (mostly) of simple scans of SA ranges or scans of the underlying text, they are easily translated to make use of recent results on Burrows-Wheeler-based compressed indexes [10] that allow fast access to elements of the suffix array from a compressed representation of it. Via this observation we derive the first compressed indexes for VLG matching.…”
Section: Discussionmentioning
confidence: 99%
“…The Burrows-Wheeler Transform (BWT) [2] and FM-index [3] are central to the most popular shortread aligners, such as BWA [9] and Bowtie [8,7], but until recently it was not known how to apply these concepts effectively to whole genomic databases. Building on previous authors' work [11], Gagie, Navarro and Prezza [4] described how a fully functional variant of the FM-index for such a database could be stored in reasonable space: their variant takes O(r) machine words, where r is the number of runs in the BWT of the database, and thus is called the r-index. Prezza [14] gave a preliminary implementation, which was significantly extended by Boucher et al [1] and Kuhnle et al [6].…”
Section: Introductionmentioning
confidence: 99%
“…With the TOPMed dataset, we achieve similar proportions by storing the identifiers at one out of 16,384 positions, making locate() 16 to 22 times slower. There is a theoretical proposal for supporting fast locate() queries in space proportional to the size of the run-length encoded BWT (Gagie et al, 2018). While there has been some progress in building the proposed index for large datasets (Kuhnle et al, 2019), scaling it up to TOPMed scale is still an open problem.…”
mentioning
confidence: 99%