2020
DOI: 10.1007/978-3-030-45257-5_7
|View full text |Cite
|
Sign up to set email alerts
|

AStarix: Fast and Optimal Sequence-to-Graph Alignment

Abstract: We present an algorithm for the optimal alignment of sequences to genome graphs. It works by phrasing the edit distance minimization task as finding a shortest path on an implicit alignment graph. To find a shortest path, we instantiate the A paradigm with a novel domain-specific heuristic function that accounts for the upcoming subsequence in the query to be aligned, resulting in a provably optimal alignment algorithm called AStarix. Experimental evaluation of AStarix shows that it is 1-2 orders of magnitude … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
11
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 15 publications
(11 citation statements)
references
References 37 publications
0
11
0
Order By: Relevance
“…From a state ⟨ i, j ⟩ where a i = b j , it is sufficient to only consider the matching edge to ⟨ i +1, j +1⟩ (Allison, 1992; Ivanov et al ., 2020), and ignore the insertion and deletion edges to ⟨ i, j +1⟩ and ⟨ i +1, j ⟩. During alignment, we greedily match as many letters as possible within the current seed before inserting only the last open state in the priority queue, but we do not cross seed boundaries in order to not interfere with match pruning.…”
Section: Pseudocodementioning
confidence: 99%
“…From a state ⟨ i, j ⟩ where a i = b j , it is sufficient to only consider the matching edge to ⟨ i +1, j +1⟩ (Allison, 1992; Ivanov et al ., 2020), and ignore the insertion and deletion edges to ⟨ i, j +1⟩ and ⟨ i +1, j ⟩. During alignment, we greedily match as many letters as possible within the current seed before inserting only the last open state in the priority queue, but we do not cross seed boundaries in order to not interfere with match pruning.…”
Section: Pseudocodementioning
confidence: 99%
“…If allowing for approximate string matching under the minimum number of edits in the sequence, then the problem can be solved in quadratic time ( Amir et al 2000 ), and extensions to consider affine gap costs ( Jain et al 2020 ) and various practical optimizations were developed later. These practical optimizations were implemented into fast exact aligners such as ( Jain et al 2019 ), ( Rautiainen et al 2019 , Rautiainen and Marschall 2020 ), and ( Ivanov et al 2020 , 2021 ).…”
Section: Introductionmentioning
confidence: 99%
“…In essence, it is a modified version of the common sequence alignment with dynamic programming (DP) algorithms, where all the incoming edges connecting a certain node in the graph to other nodes are considered while calculating the cell’s score to find the best path of the sequence through the graph. In recent years, several tools have been introduced to perform sequence-to-graph alignments with better speeds and accuracy ( Ivanov et al 2020 , Rautiainen and Marschall 2020 , Sirén et al 2021 ).…”
Section: Introductionmentioning
confidence: 99%