2022
DOI: 10.1007/978-3-031-04749-7_34
|View full text |Cite
|
Sign up to set email alerts
|

Lossless Indexing with Counting de Bruijn Graphs

Abstract: High-throughput sequencing data is rapidly accumulating in public repositories. Making this resource accessible for interactive analysis at scale requires efficient approaches for its storage and indexing. There have recently been remarkable advances in solving the experiment discovery problem and building compressed representations of annotated de Bruijn graphs where k-mer sets can be efficiently indexed and interactively queried. However, approaches for representing and retrieving other quantitative attribut… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
21
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(21 citation statements)
references
References 6 publications
0
21
0
Order By: Relevance
“…Lastly in this section, we report that other works [17, 14] considered the multi-document version of the problem studied here, that is, how to retrieve a vector of weights for a query k -mer, where each component of the vector represents the weight of the k -mer in a distinct document. Also such count vectors are usually very “regular” (or can be made so by introducing some approximation) [17] and present runs of equal symbols that can be compressed effectively with run-length encoding (RLE).…”
Section: Related Workmentioning
confidence: 99%
“…Lastly in this section, we report that other works [17, 14] considered the multi-document version of the problem studied here, that is, how to retrieve a vector of weights for a query k -mer, where each component of the vector represents the weight of the k -mer in a distinct document. Also such count vectors are usually very “regular” (or can be made so by introducing some approximation) [17] and present runs of equal symbols that can be compressed effectively with run-length encoding (RLE).…”
Section: Related Workmentioning
confidence: 99%
“…At constant memory usage, adding the abundance information would yield an extremely high false-positive rate. As such, methods storing abundances mostly rely on compression by clustering abundance with neighbouring k-mers or across datasets, as Reindeer [10] or Counting de Bruijn graphs [6]. These methods do not rely on counting AMQ, but rather on exact data structures.…”
Section: Introductionmentioning
confidence: 99%
“…While a sequence graph by itself can be used to check for the presence or absence of a query sequence within a set, it cannot classify or profile the query without additional metadata, called graph annotations . Graph annotations are a key-value store associating each graph node with a number of annotations, where annotations can include the labels of the indexed samples [32,49,28,26], node abundances [33,44], genomic coordinates [33,1,20], geographic coordinates [32], etc. [59].…”
Section: Introductionmentioning
confidence: 99%
“…For jointly indexing unassembled read sets, annotated De Bruijn graphs typically scale better than variation graphs in representation size due to the collapse of shared k-mers between samples onto single graph nodes [32]. Before joint graph construction, many indexing tools [32,33,49,58,55,28,59,28] perform error correction ( cleaning ) on each sample to remove uncertain k-mers [62]. Thus, these joint graphs can represent the samples’ respective assembly graphs [32].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation