Abstract:Genomic variations in a reference collection are naturally represented as genome variation graphs. Such graphs encode common subsequences as vertices and the variations are captured using additional vertices and directed edges. The resulting graphs are directed graphs possibly with cycles. Existing algorithms for aligning sequences on such graphs make use of partial order alignment (POA) techniques that work on directed acyclic graphs (DAGs). To achieve this, acyclic extensions of the input graphs are first co… Show more
“…Many of these, however, are designed only for specific types of genome graphs, such as de Bruijn graphs [24,11,23] and variation graphs [9]. A compromise often made when aligning sequences to cyclic graphs using algorithms reliant on directed acyclic graphs involves the computationally expensive "DAG-ification" of graph regions [18,9].…”
Section: Related Workmentioning
confidence: 99%
“…, e k ) in G r induces a spelling σ(π) ∈ Σ * defined by σ(e 1 ) · · · σ(e k ), where σ(e i ) is the label of edge e i and Σ * := k∈N Σ k . We note that our approach naturally handles cyclic walks and does not require cycle unrolling, a feature shared with GraphAligner [27] and Brown-ieAligner [11] but missing from vg [9], PaSGAL [15] and V-ALIGN [18].…”
Section: Task Description: Alignment To Reference Graphsmentioning
We present an algorithm for the optimal alignment of sequences to genome graphs. It works by phrasing the edit distance minimization task as finding a shortest path on an implicit alignment graph. To find a shortest path, we instantiate the A paradigm with a novel domain-specific heuristic function that accounts for the upcoming subsequence in the query to be aligned, resulting in a provably optimal alignment algorithm called AStarix. Experimental evaluation of AStarix shows that it is 1-2 orders of magnitude faster than state-of-the-art optimal algorithms on the task of aligning Illumina reads to reference genome graphs. Implementations and evaluations are available at https://github.com/eth-sri/astarix.
“…Many of these, however, are designed only for specific types of genome graphs, such as de Bruijn graphs [24,11,23] and variation graphs [9]. A compromise often made when aligning sequences to cyclic graphs using algorithms reliant on directed acyclic graphs involves the computationally expensive "DAG-ification" of graph regions [18,9].…”
Section: Related Workmentioning
confidence: 99%
“…, e k ) in G r induces a spelling σ(π) ∈ Σ * defined by σ(e 1 ) · · · σ(e k ), where σ(e i ) is the label of edge e i and Σ * := k∈N Σ k . We note that our approach naturally handles cyclic walks and does not require cycle unrolling, a feature shared with GraphAligner [27] and Brown-ieAligner [11] but missing from vg [9], PaSGAL [15] and V-ALIGN [18].…”
Section: Task Description: Alignment To Reference Graphsmentioning
We present an algorithm for the optimal alignment of sequences to genome graphs. It works by phrasing the edit distance minimization task as finding a shortest path on an implicit alignment graph. To find a shortest path, we instantiate the A paradigm with a novel domain-specific heuristic function that accounts for the upcoming subsequence in the query to be aligned, resulting in a provably optimal alignment algorithm called AStarix. Experimental evaluation of AStarix shows that it is 1-2 orders of magnitude faster than state-of-the-art optimal algorithms on the task of aligning Illumina reads to reference genome graphs. Implementations and evaluations are available at https://github.com/eth-sri/astarix.
“…Although POA is defined only for acyclic graphs, it can be extended to cyclic graphs by unfolding cyclic components, which is the approach taken by the VG toolkit [16] and ExpansionHunter [9]. The practical efficiency of this unfolding depends on the read length and the graph topology and complex cyclic areas can lead to very large unfolded graphs [20]. V-Align [20] aligns to cyclic graphs but its runtime depends on the graph's feedback vertex set size.…”
Section: Introductionmentioning
confidence: 99%
“…The practical efficiency of this unfolding depends on the read length and the graph topology and complex cyclic areas can lead to very large unfolded graphs [20]. V-Align [20] aligns to cyclic graphs but its runtime depends on the graph's feedback vertex set size. Some tools use heuristic approaches for aligning to de Bruijn graphs using depth-first search [6,21,8].…”
Genome graphs can represent genetic variation and sequence uncertainty. Aligning sequences to genome graphs is key to many applications, including error correction, genome assembly, and genotyping of variants in a pan-genome graph. Yet, so far this step is often prohibitively slow. We present GraphAligner, a tool for aligning long reads to genome graphs. Compared to state-of-the-art tools, GraphAligner is 12x faster and uses 5x less memory, making it as efficient as aligning reads to linear reference genomes. When employing GraphAligner for error correction, we find it to be almost 3x more accurate and over 15x faster than extant tools.Availability: Package manager: https://anaconda.org/bioconda/graphaligner and source code: https://github.com/maickrau/GraphAligner
“…Many popular short read assemblers [9,10,11] provide the user not only with a set of contig sequences, but also with assembly graphs, encoding the information on the potential adjacencies of the assembled sequences. Naturally arising problem of sequence-to-graph alignment has been a topic of many recent studies [2,3,5,6,7,8,13,14]. Identifying alignments of long error-prone reads (such as Pacbio and ONT reads) to assembly graphs is particularly important and has recently been applied to hybrid genome assembly [1,4], read error correction [12], and haplotype separation [3].…”
this data a statistically significant divergence with the model was detected. At the same time, no divergence was detected for diseases not related to onco-hematology. This experiment has shown that a V-D/D-J junction length distribution in Ig repertoire may be used as an indicator of the presence of pathological clones in a B-cell population. The possibility of the model application as an early predictor of various diseases presents a significant interest for further research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.