2017
DOI: 10.1007/s11786-016-0283-z
|View full text |Cite
|
Sign up to set email alerts
|

Compressed Spaced Suffix Arrays

Abstract: Abstract. Spaced seeds are important tools for similarity search in bioinformatics, and using several seeds together often significantly improves their performance. With existing approaches, however, for each seed we keep a separate linear-size data structure, either a hash table or a spaced suffix array (SSA). In this paper we show how to compress SSAs relative to normal suffix arrays (SAs) and still support fast random access to them. We first prove a theoretical upper bound on the space needed to store an S… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 36 publications
0
3
0
Order By: Relevance
“…The success of the mainstream mappers BWA-MEM (Li, 2013 ) and Bowtie2 (Langmead and Salzberg, 2012 ) is due in part to the FM-index, which only supports contiguous seeds. Some workarounds are available for spaced seeds (Horton et al, 2008 ; Gagie et al, 2017 ) but they increase the memory footprint, explaining that short reads are typically mapped using contiguous seeds. More generally, computing the sensitivity of spaced seeds is challenging (Kucherov et al, 2006 ; Li et al, 2006 ; Martin and Noé, 2017 ).…”
Section: Seedsmentioning
confidence: 99%
“…The success of the mainstream mappers BWA-MEM (Li, 2013 ) and Bowtie2 (Langmead and Salzberg, 2012 ) is due in part to the FM-index, which only supports contiguous seeds. Some workarounds are available for spaced seeds (Horton et al, 2008 ; Gagie et al, 2017 ) but they increase the memory footprint, explaining that short reads are typically mapped using contiguous seeds. More generally, computing the sensitivity of spaced seeds is challenging (Kucherov et al, 2006 ; Li et al, 2006 ; Martin and Noé, 2017 ).…”
Section: Seedsmentioning
confidence: 99%
“…Even if a full suffix array would be too large, we can consider a sparse index, and/or distributing the reference sequences into separately-indexed volumes. Furthermore, these indexes usually compress standard suffix arrays, and it is unclear how effectively they can be extended to subset seeding, minimizers, etc [8], [31].…”
Section: Compact / Succinct / Compressed Indexesmentioning
confidence: 99%
“…seeds" [34]. It is also possible to compress a spaced-seed suffix array relative to a normal suffix array [31].…”
Section: Multiple Seed Patternsmentioning
confidence: 99%