2019
DOI: 10.1101/2019.12.20.884924
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Distance Indexing and Seed Clustering in Sequence Graphs

Abstract: Graph representations of genomes are capable of expressing more genetic variation and can therefore better represent a population than standard linear genomes. However, due to the greater complexity of genome graphs relative to linear genomes, some functions that are trivial on linear genomes become more difficult in genome graphs. Calculating distance is one such function that is simple in a linear genome but much more complicated in a graph context. In read mapping algorithms, distance calculations are commo… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 18 publications
0
3
0
Order By: Relevance
“…The clustering algorithm in vg mpmap is built around the distance index described in [71]. In brief, this index can query the minimum distance between two positions in the pangenome graph by expressing the distance as the sum of a small number of precomputed distances.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The clustering algorithm in vg mpmap is built around the distance index described in [71]. In brief, this index can query the minimum distance between two positions in the pangenome graph by expressing the distance as the sum of a small number of precomputed distances.…”
Section: Methodsmentioning
confidence: 99%
“…Edges that are much longer than the read length are not added; this avoids treating distal elements on the same chromosome as part of the same cluster. In addition, we accelerate this process using Algorithm 3 from [71], which partitions seeds into equivalence classes based on the distance between them. The equivalence relation is the transitive closure of the relation of being connected by a path of length less than d , which is a tunable parameter.…”
Section: Methodsmentioning
confidence: 99%
“…Given the additional complexities and overheads of processing a genome graph instead of a linear reference genome, graphbased analysis exacerbates the bottlenecks of read-to-reference mapping. Due to the nascent nature of sequence-to-graph mapping, a much smaller number of software tools (and no hardware accelerators) exist for sequence-to-graph mapping [36,54,61,66,[77][78][79][80][81][82][83] compared to the traditional sequence-to-sequence mapping.…”
Section: Introductionmentioning
confidence: 99%