2020
DOI: 10.1145/3381417
|View full text |Cite
|
Sign up to set email alerts
|

Linear-time String Indexing and Analysis in Small Space

Abstract: The field of succinct data structures has flourished over the last 16 years. Starting from the compressed suffix array by Grossi and Vitter (STOC 2000) and the FM-index by Ferragina and Manzini (FOCS 2000), a number of generalizations and applications of string indexes based on the Burrows-Wheeler transform (BWT) have been developed, all taking an amount of space that is close to the input size in bits. In many large-scale applications, the construction of the index and its usage need to be considered as one … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
75
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
2
1
1

Relationship

1
9

Authors

Journals

citations
Cited by 26 publications
(75 citation statements)
references
References 71 publications
0
75
0
Order By: Relevance
“…iGenomics uses a version of QuickSort, a divide-and-conquer sorting algorithm, because on average it takes O( n log n ) time for n objects to be sorted. Although there are now some more efficient BWT construction algorithms [ 31 ], given that iGenomics is targeted towards relatively small genomes (<100,000 bp), the amount of time for BWT sorting is negligible compared to the time to align the reads. Finally, to obtain the BWT from the sorted array, the final character of each row in the matrix is copied into a string with the first character copied having the first position, the second character copied having the second position, and so forth.…”
Section: Methodsmentioning
confidence: 99%
“…iGenomics uses a version of QuickSort, a divide-and-conquer sorting algorithm, because on average it takes O( n log n ) time for n objects to be sorted. Although there are now some more efficient BWT construction algorithms [ 31 ], given that iGenomics is targeted towards relatively small genomes (<100,000 bp), the amount of time for BWT sorting is negligible compared to the time to align the reads. Finally, to obtain the BWT from the sorted array, the final character of each row in the matrix is copied into a string with the first character copied having the first position, the second character copied having the second position, and so forth.…”
Section: Methodsmentioning
confidence: 99%
“…We will also propose a heuristic version of the algorithm that solves a relaxed variant of Problem 1 in linear-time O ( n ). All these complexities are on top of the FMD-index construction [25], which in our case can be done in O ( m ) time and space [5].…”
Section: Problem Definitionmentioning
confidence: 99%
“…time as they are sequenced [39]. As long read sequencing replaces the current technology, the advantages of TGA will increase further, because indexing reads will be quicker [40] and the number of blueprint update steps will decrease.…”
Section: Blueprint Genomementioning
confidence: 99%