2020
DOI: 10.1093/bioinformatics/btaa487
|View full text |Cite
|
Sign up to set email alerts
|

REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets

Abstract: Motivation In this work we present REINDEER, a novel computational method that performs indexing of sequences and records their abundances across a collection of datasets. To the best of our knowledge, other indexing methods have so far been unable to record abundances efficiently across large datasets. Results We used REINDEER to index the abundances of sequences within 2585 human RNA-seq experiments in 45 h using only 56 GB… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
54
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 44 publications
(55 citation statements)
references
References 29 publications
1
54
0
Order By: Relevance
“…We recall that in this work, we are interested in SPSS that represents a set of k -mers and will refer to them, and will not take into account multi-sets. Unitigs are one SPSS, super- k -mers of unitigs are another [14]. Two other equivalent SPSSs schemes, UST [21] and simplitigs [32], longer than unitigs, were recently independently proposed.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…We recall that in this work, we are interested in SPSS that represents a set of k -mers and will refer to them, and will not take into account multi-sets. Unitigs are one SPSS, super- k -mers of unitigs are another [14]. Two other equivalent SPSSs schemes, UST [21] and simplitigs [32], longer than unitigs, were recently independently proposed.…”
Section: Resultsmentioning
confidence: 99%
“…The challenge of indexing colored de Bruijn graphs [34] (or more generally to answer large sequence search problems as defined in [10]) have caught the interest of a community and could be a direct application of this work. For example, BLight is successfully integrated as an indexing structure in REINDEER [14], a k -mer data structure that enables the quantification of query sequences in thousands of raw read samples.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…For instance, data structures for membership queries [ 78 ] relying on unitigs [ 38 , 40 – 43 ] could be redesigned to use simplitigs instead. In many applications, including some of the traditional alignment-free methods [ 13 , 14 ], it is desirable to consider k -mers with counts, which leads to so-called weighted de Bruijn graphs [ 79 ]; a recent manuscript [ 80 ] introduced monotigs which are a form of short simplitigs to encode this information. Furthermore, multiple de Bruijn graphs are often considered simultaneously; the resulting structure is usually referred to as a colored de Bruijn graph [ 15 ] and the associated data structures have been also widely studied [ 41 , 43 , 51 , 81 – 89 ].…”
Section: Discussionmentioning
confidence: 99%
“…The vast majority of these large-scale k -mer indexing tools are based on common building blocks, three of them being: 1) k -mer counting, which summarizes sequencing data into a set of k -mers along with their abundances, 2), k -mer matrix construction, which aggregates lists of k -mer counts over a collection of samples (e.g. as in Marchet et al (2020); Muggli et al (2019)) in the form of a k -mer/sample matrix with abundances as values, and 3) Bloom filters construction, where the k-mer presence/absence information for each sample is converted into a Bloom filter to save space and allow fast queries. Note that these building blocks are not specific to k -mer indexing tools, e.g.…”
Section: Introductionmentioning
confidence: 99%