2020
DOI: 10.1101/2020.01.22.915496
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

AStarix: Fast and Optimal Sequence-to-Graph Alignment

Abstract: We present an algorithm for the optimal alignment of sequences to genome graphs. It works by phrasing the edit distance minimization task as finding a shortest path on an implicit alignment graph. To find a shortest path, we instantiate the A paradigm with a novel domain-specific heuristic function that accounts for the upcoming subsequence in the query to be aligned, resulting in a provably optimal alignment algorithm called AStarix. Experimental evaluation of AStarix shows that it is 1-2 orders of magnitude … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
22
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 11 publications
(23 citation statements)
references
References 36 publications
1
22
0
Order By: Relevance
“…Considering all nodes v ∈ V r as possible starting points for the alignment means that the A algorithm would explore all states of the form v, 0 , which immediately induces a high overhead of |V r |. In line with previous works [10,12], we avoid this overhead by complementing the reference graph with a trie index.…”
Section: Trie Indexmentioning
confidence: 97%
See 1 more Smart Citation
“…Considering all nodes v ∈ V r as possible starting points for the alignment means that the A algorithm would explore all states of the form v, 0 , which immediately induces a high overhead of |V r |. In line with previous works [10,12], we avoid this overhead by complementing the reference graph with a trie index.…”
Section: Trie Indexmentioning
confidence: 97%
“…Implementation. The seed heuristic and prex heuristic reuse the same free and open source C++ codebase of the AStarix aligner [10]. It includes a simple implementation of a graph and trie data structure which is not optimized for memory usage.…”
Section: Implementation and Parameter Choicesmentioning
confidence: 99%
“…Alongside these, there has also been growing interest in the use of de Bruijn graph-based indexes for alignment tasks as a way to accelerate alignment to repeat-prone reference genomes [41] or to unassembled read sets [40, 28]. More recent work has focused on improving the scalability of these approaches, either through strategies using more rigorous early cut-off criteria [33], or via the introduction of heuristics [55]. A major challenge faced by all existing methods is to unite the ability to efficiently operate on petabase scale input data with the capacity for fast and versatile query operations.…”
Section: Introductionmentioning
confidence: 99%
“…Graph representations more accurately reflect the sampled individuals within a population, and their use in genome mapping algorithms reduces reference bias and increases mapping accuracy when sequencing a new individual ( Ballouz et al , 2019 ). There is abundant research on data structures designed for graph representations of genomes and pan-genomes ( Garrison et al , 2018 ; Li et al , 2020 ), their space-efficient indexing ( Chang et al , 2020 ; Ghaffaari and Marschall, 2019 ; Holley et al , 2016 ; Jain et al , 2019b ; Kuhnle et al , 2020 ; Marcus et al , 2014 ; Sirén et al , 2014 ) and alignment algorithms ( Darby et al , 2020 ; Ivanov et al , 2020 ; Jain et al , 2020 ; Kuosmanen et al , 2018 ; Rautiainen and Marschall, 2020 ) to map sequences to reference graphs. For review papers summarizing these developments, see Computational Pan-Genomics Consortium (2018) , Eizenga et al (2020) , and Paten et al (2017) .…”
Section: Introductionmentioning
confidence: 99%