2020
DOI: 10.1101/2020.01.12.903443
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Simplitigs as an efficient and scalable representation of de Bruijn graphs

Abstract: MotivationDe Bruijn graphs play an essential role in computational biology, facilitating rapid alignment-free comparison of genomic datasets as well as reconstruction of underlying genomic sequences. Subsequently, an important question is how to efficiently represent, compress, and transmit de Bruijn graphs of the most common types of genomic data sets, such as sequencing reads, genomes, and pan-genomes. ResultsWe introduce simplitigs, an effective representation of de Bruijn graphs for alignment-free applicat… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
54
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 19 publications
(57 citation statements)
references
References 81 publications
3
54
0
Order By: Relevance
“…The idea of using a SPSS for a membership index was previously independently described in a PhD thesis [12] and questions similar to the ones in our paper are simultaneously and independently studied in [13]. The idea of greedily gluing unitigs (as UST does) has previously appeared in read compression [14], where contigs greedily constructed from the reads and the reads were stored as alignments to these contigs.…”
Section: Related Workmentioning
confidence: 87%
See 1 more Smart Citation
“…The idea of using a SPSS for a membership index was previously independently described in a PhD thesis [12] and questions similar to the ones in our paper are simultaneously and independently studied in [13]. The idea of greedily gluing unitigs (as UST does) has previously appeared in read compression [14], where contigs greedily constructed from the reads and the reads were stored as alignments to these contigs.…”
Section: Related Workmentioning
confidence: 87%
“…It corresponds to SPSS representation {AAACGGA, ACT GGT }. It is easy to verify that this path cover has minimum size, and, by Theorem 1, the corresponding representation has minimum weight (13). (C) Another path cover that could potentially be found by UST.…”
Section: Proofmentioning
confidence: 99%
“…So far, we have reviewed the following SPSSs for a set of k-mers X: X itself, the unitigs of X, any set of super-kmers that together contains all k-mers of X (such as the super-k-mers of the sequencing reads where X originated from), and the super-k-mers of the unitigs of X. To this list we can add the recently-introduced (and equivalent) concepts of UST and simplitigs [16,22]. They are SPSSs that aim to minimize their total number of nucleotides.…”
Section: Spectrum-preserving String Sets In Relation To K-mer Indexingmentioning
confidence: 99%
“…KMC [19]) Note: unlike all others, this SPSS represents the multiset of k-mers (with duplicates) from the reads, not a set of distinct k-mers super-k-mers of unitigs same as above, except substrings of unitigs instead of substrings of reads (e.g. BLight [17]) UST [16] set of sequences obtained by greedily concatenating unitigs in order to minimize the total number of nucleotides in the SPSS simplitigs [22] similar to [16] monotigs a set of paths that covers the (uncompacted) de Bruijn graph such that all k-mers have an identical count-vector and minimizer Table 1: Categories of spectrum-preserving string set schemes known from previous literature (and monotigs, introduced in this article). See also Fig.…”
Section: Spss Schemementioning
confidence: 99%
See 1 more Smart Citation