2015
DOI: 10.1371/journal.pone.0133198
|View full text |Cite
|
Sign up to set email alerts
|

Indexing Arbitrary-Length k-Mers in Sequencing Reads

Abstract: We propose a lightweight data structure for indexing and querying collections of NGS reads data in main memory. The data structure supports the interface proposed in the pioneering work by Philippe et al. for counting and locating k-mers in sequencing reads. Our solution, PgSA (pseudogenome suffix array), based on finding overlapping reads, is competitive to the existing algorithms in the space use, query times, or both. The main applications of our index include variant calling, error correction and analysis … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 18 publications
(10 citation statements)
references
References 33 publications
0
10
0
Order By: Relevance
“…HG-CoLoR: Similar to FMLRC, it avoids using a fixed k-mer size for the de Bruijn graph. Accordingly, it relies on a variable-order de Bruijn graph structure [42]. It also uses a seed-and-extend approach to align long reads to the graph.…”
Section: Short-read-assembly-based Methodsmentioning
confidence: 99%
“…HG-CoLoR: Similar to FMLRC, it avoids using a fixed k-mer size for the de Bruijn graph. Accordingly, it relies on a variable-order de Bruijn graph structure [42]. It also uses a seed-and-extend approach to align long reads to the graph.…”
Section: Short-read-assembly-based Methodsmentioning
confidence: 99%
“…After the alignment step, a variable order de Bruijn graph is built from the solid k-mers of the corrected short reads. Unlike FMLRC, this graph is built with the help of PgSA [37]. Moreover, HG-CoLoR allows to explore every order of the graph, between a minimum order k and a maximum order K, instead of limiting the graph explorations to two different orders.…”
Section: Hg-color (2018)mentioning
confidence: 99%
“…The proposed PgRC is based on a few ideas. The key one is an approximation of the shortest common superstring over a set of the given reads, which we call a "pseudogenome" (hence the name of our tool), an idea basically described in (Kowalski et al, 2015). In this work, however, we modify the procedure from our earlier research, by partitioning the read set into groups, related to their quality and the existence of N symbols in them.…”
Section: Overviewmentioning
confidence: 99%
“…More concretely, we followed the pseudogenome construction algorithm (Kowalski et al, 2015). Given a read array…”
Section: Read Partitioning and Pseudogenome Generationmentioning
confidence: 99%