2017
DOI: 10.1101/138016
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Rainbowfish: A Succinct Colored de Bruijn Graph Representation

Abstract: The colored de Bruijn graph-a variant of the de Bruijn graph which associates each edge (i.e., k-mer) with some set of colors -is an increasingly important combinatorial structure in computational biology. Iqbal et al. demonstrated the utility of this structure for representing and assembling a collection (population) of genomes, and showed how it can be used to accurately detect genetic variants. Muggli et al. introduced VARI, a representation of the colored de Bruijn graph that adopts the BOSS representati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
87
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 54 publications
(88 citation statements)
references
References 13 publications
(10 reference statements)
1
87
0
Order By: Relevance
“…To encode A, we start by forming a matrix A ∈ {0, 1} r×|L| of sorted unique rows, A t j = A it j . Then we compress A with the flat row-major representation using an RRR vector (named after the initials of the three original authors [20]) as the underlying storage technique and construct a coding vector (i(v) − 1) v∈V , where i(v) maps each node v ∈ V to the index of the row in A corresponding to the labeling of v. The coding vector is represented in a variable-length packed binary coding with a delimiter vector [4] compressed into an RRR vector [20].…”
Section: Binary Relation Representation Schemesmentioning
confidence: 99%
See 2 more Smart Citations
“…To encode A, we start by forming a matrix A ∈ {0, 1} r×|L| of sorted unique rows, A t j = A it j . Then we compress A with the flat row-major representation using an RRR vector (named after the initials of the three original authors [20]) as the underlying storage technique and construct a coding vector (i(v) − 1) v∈V , where i(v) maps each node v ∈ V to the index of the row in A corresponding to the labeling of v. The coding vector is represented in a variable-length packed binary coding with a delimiter vector [4] compressed into an RRR vector [20].…”
Section: Binary Relation Representation Schemesmentioning
confidence: 99%
“…Rainbowfish. The current state-of-the-art for genome graph labeling is a row-major representation of the binary relation matrix A in which an optimal coding is constructed for the set of rows in A [4]. More precisely, let A i1 , .…”
Section: Binary Relation Representation Schemesmentioning
confidence: 99%
See 1 more Smart Citation
“…Collapsing k-mers that are shared between closely related genomes would decrease both the storage space for the index and the search space for subsequent queries. Recent implementations of such an approach include Mantis [23], Rainbowfish [4] and VARI-Merge [22]. They build joint de Bruijn graphs for multiple genomes, coloring nodes by their source genomes (colored de Bruijn graphs [13]), and can traverse shared paths in the graph which represent conserved regions as well as diverging paths which represent variable regions.…”
Section: Introductionmentioning
confidence: 99%
“…In the best case, software libraries for building and manipulating de Bruijn graphs are used (Drezen et al, 2014;Crusoe et al, 2015) but in most cases, data structures to index the de Bruijn graph are re-implemented. Those downsides are intensified in the colored de Bruijn graph for which the memory consumption of colors rapidly overtakes the vertices and edges memory usage (Almodaresi et al, 2017). For this reason, a lot of attention has been given to succinct data structures for building the colored de Bruijn graph (Marcus et al, 2014;Holt and McMillan, 2014;Holley et al, 2015;Baier et al, 2016;Muggli et al, 2017;Almodaresi et al, 2017Almodaresi et al, , 2018Muggli et al, 2019) and data structures for multi-set k-mer indexing (Solomon and Kingsford, 2016;Sun et al, 2018;Solomon and Kingsford, 2018;Pandey et al, 2018;Yu et al, 2018;Bradley et al, 2019).…”
Section: Introductionmentioning
confidence: 99%