2020
DOI: 10.1089/cmb.2019.0309
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Construction of a Complete Index for Pan-Genomics Read Alignment

Abstract: Short-read aligners predominantly use the FM-index, which is easily able to index one or a few human genomes. However, it does not scale well to indexing collections of thousands of genomes. Driving this issue are the two chief components of the index: (1) a rank data structure over the Burrows-Wheeler Transform (BWT) of the string that will allow us to find the interval in the string's suffix array (SA), and (2) a sample of the SA that-when used with the rank data structure-allows us to access the SA. The ran… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
1

Relationship

2
7

Authors

Journals

citations
Cited by 41 publications
(20 citation statements)
references
References 35 publications
(47 reference statements)
0
19
0
Order By: Relevance
“…This might be accomplished using unsupervised, sequence-driven clustering methods [ 34 , 35 ], using the “founder sequence” framework [ 36 , 37 ], or using some form of submodular optimization [ 38 ]. A more radical idea is to simply index all available individuals, forgoing the need to choose representatives; this is becoming more practical with the advent of new approaches for haplotype-aware path indexing [ 31 ] and efficient indexing for repetitive texts [ 39 ].…”
Section: Discussionmentioning
confidence: 99%
“…This might be accomplished using unsupervised, sequence-driven clustering methods [ 34 , 35 ], using the “founder sequence” framework [ 36 , 37 ], or using some form of submodular optimization [ 38 ]. A more radical idea is to simply index all available individuals, forgoing the need to choose representatives; this is becoming more practical with the advent of new approaches for haplotype-aware path indexing [ 31 ] and efficient indexing for repetitive texts [ 39 ].…”
Section: Discussionmentioning
confidence: 99%
“…As previously mentioned, Gagie et al (2020) did not describe how to build the r-index -this was shown in a series of papers (Kuhnle et al 2020;Mun et al 2020;Boucher et al 2019). In particular, Boucher et al (2019) introduced Prefix Free Parsing (PFP), which takes as input a string S, window size w, and a prime p and produces a dictionary of substrings of S and a parse of S, that is a sequence of substrings in the alphabet (Kreft and Navarro 2013) -and showed how to build RLBWT from the dictionary and parse.…”
Section: How To Construct the R-indexmentioning
confidence: 99%
“…Briefly explained, FM-index alignment tools are derived from the Burrows-Wheeler Transform [ 68 ]—a method to sufficiently compress large amount of data and finding approximate matches of sequences in the reference genome [ 69 ]. Hash table-based aligners uses the seed-and-extend method in combination with additional alignment algorithms [ 68 , 70 , 71 ].…”
Section: Precautions Of Data Output From Sequencingmentioning
confidence: 99%