2022
DOI: 10.1101/2022.08.29.505691
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Sequence to graph alignment using gap-sensitive co-linear chaining

Abstract: Co-linear chaining is a widely used technique in sequence alignment tools that follow seed-filter-extend methodology. It is a mathematically rigorous approach to combine small exact matches. For co-linear chaining between two sequences, efficient subquadratic-time chaining algorithms are well-known for linear, concave and convex gap cost functions [Eppstein et al. JACM'92]. However, developing extensions of chaining algorithms for DAGs (directed acyclic graphs) has been challenging. Recently, a new sparse dyna… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
44
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 8 publications
(44 citation statements)
references
References 59 publications
0
44
0
Order By: Relevance
“…The chaining algorithm we present does not take into account any genomic coordinate or graph traversal information, as is typically done with co-linear chaining algorithms [39,33,2,42,9]. Since coordinates are only defined for the input sequences, applying this technique for coordinates assigned to contigs from low-coverage assembly graphs would produce short alignments, as demonstrated by minigraph during our evaluation.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The chaining algorithm we present does not take into account any genomic coordinate or graph traversal information, as is typically done with co-linear chaining algorithms [39,33,2,42,9]. Since coordinates are only defined for the input sequences, applying this technique for coordinates assigned to contigs from low-coverage assembly graphs would produce short alignments, as demonstrated by minigraph during our evaluation.…”
Section: Discussionmentioning
confidence: 99%
“…Finally, anchor extension is the alignment of the query to the graph via forward and backward search from the ends of each anchor. A common anchor filtration method is co-linear chaining [38,39,2,33,42,9], a dynamic programming algorithm that finds high-scoring anchor chains. A chain is a series of anchors that appear in the correct order with respect to the query such that each anchor can reach the subsequent anchor in the chain via graph traversal.…”
Section: Introductionmentioning
confidence: 99%
“…Unfortunately, even finding an exact occurrence of a query string as a subpath in a graph is a conditionally hard problem [12,13]: only quadratic time dynamic programming solutions are known and faster algorithms would contradict the Strong Exponential Time Hypothesis (SETH). Due to these difficulties, researchers have aimed at finding parameterized solutions to such alignment problems [3,10,9,22,21,6,24,26], and/or separating the task into finding short exact occurrences (anchors) and then chaining them into longer matches. Colinear chaining has been effectively applied as a heuristic to the original alignment problem.…”
Section: Introductionmentioning
confidence: 99%
“…For the case with two strings as input, a recent formulation of co-linear chaining [19] captures unit cost edit distance. There has been an attempt to extend the results to graphs considering gap costs [6], but it appears difficult to make such formulation fully symmetric (due to there being exponential many paths between two anchors).…”
Section: Introductionmentioning
confidence: 99%
“…Since the time complexity of optimal sequence-to-graph alignment grows linearly with the number of edges in the graph [20,16], many approaches instead follow an approximate seed-and-extend strategy [2], which operates in four main steps: i) seed extraction , which in its simplest form involves finding all substrings with a certain length, ii) seed anchoring , finding matching nodes in the graph, iii) seed filtration , often involving clustering [9,37] or co-linear chaining [25,1,32,8] of seeds, and iv) seed extension , involving performing semi-global pairwise sequence alignment forwards and backwards from each anchored seed [28]. We will review the usage of exact seeds utilized in tools such as vg[15] and G raph A ligner [37] and discuss their limitations in a high mutation-rate setting.…”
Section: Introductionmentioning
confidence: 99%