2018
DOI: 10.1016/j.cels.2018.05.021
|View full text |Cite
|
Sign up to set email alerts
|

Abstract: Sequence-level searches on large collections of RNA sequencing experiments, such as the NCBI Sequence Read Archive (SRA), would enable one to ask many questions about the expression or variation of a given transcript in a population. Existing approaches, such as the sequence Bloom tree, suffer from fundamental limitations of the Bloom filter, resulting in slow build and query times, less-than-optimal space usage, and potentially large numbers of false-positives. This paper introduces Mantis, a space-efficient … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
107
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 88 publications
(108 citation statements)
references
References 26 publications
1
107
0
Order By: Relevance
“…The resulting structure is usually referred to as a colored de Bruijn graph [19] and its representations have been widely studied ( [50][51][52][53][54][55][56][57][58][59][60][61] ). Even though we touched this setting in the section Multiple pan-genomes, exploiting the similarity between individual de Bruijn graphs for further compression in simplitig-based approaches is to be addressed in future work.…”
Section: Discussionmentioning
confidence: 99%
“…The resulting structure is usually referred to as a colored de Bruijn graph [19] and its representations have been widely studied ( [50][51][52][53][54][55][56][57][58][59][60][61] ). Even though we touched this setting in the section Multiple pan-genomes, exploiting the similarity between individual de Bruijn graphs for further compression in simplitig-based approaches is to be addressed in future work.…”
Section: Discussionmentioning
confidence: 99%
“…In total we generated three different kinds of series: i) random matrices with uniformly distributed set bits, ii) initially generated random matrix rows duplicated and permuted randomly, iii) initially generated random matrix columns duplicated and permuted randomly. The motivation behind these series is as follows: The best performing state-of-the-art compressors exploit redundancy between rows of the binary relation matrix [19]. However, the usual structure of annotated de Bruijn graphs often implies a correlation structure on the columns not necessarily leading to redundant rows, for instance when the sequences of many similar or closely related samples are inserted.…”
Section: Datamentioning
confidence: 99%
“…Kingsford Human RNA-Seq. This dataset consists of 2,652 Human RNA-Seq experiments originally drawn from [21] and subsequently used in [19] for comparison. NCBI RefSeq.…”
Section: Datamentioning
confidence: 99%
See 2 more Smart Citations