2023
DOI: 10.1093/bioinformatics/btad460
|View full text |Cite
|
Sign up to set email alerts
|

Chaining for accurate alignment of erroneous long reads to acyclic variation graphs

Abstract: Motivation Aligning reads to a variation graph is a standard task in pangenomics, with downstream applications such as improving variant calling. While the vg toolkit (Garrison et al., 2018) is a popular aligner of short reads, GraphAligner (Rautiainen and Marschall, 2020) is the state-of-the-art aligner of erroneous long reads. GraphAligner works by finding candidate read occurrences based on individually extending the best seeds of the read in the variation graph. However, a more principled… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
9
1

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 10 publications
(10 citation statements)
references
References 40 publications
0
9
1
Order By: Relevance
“…For instance, Minigraph ( Li et al 2020 ) is robust in mapping sequences directly onto the pangenome graph, while Vg Giraffe ( Sirén et al 2021 ) is noted for its effectiveness in aligning short-read sequencing data to the pangenome graph. Additionally, tools like GraphChainer ( Ma et al 2023 ) and GraphAligner ( Rautiainen and Marschall 2020 ) offer advantages in aligning long-read data, demonstrating the tailored functionalities of these algorithms in addressing different aspects of pangenome analysis. Currently, three approaches are available for generating different types of pangenome graphs, including the genetic variation graph, reference-based pangenome graph, and reference-unbiased pangenome graph.…”
Section: Different Plant Pangenome Construction Pipelines and Strategiesmentioning
confidence: 99%
“…For instance, Minigraph ( Li et al 2020 ) is robust in mapping sequences directly onto the pangenome graph, while Vg Giraffe ( Sirén et al 2021 ) is noted for its effectiveness in aligning short-read sequencing data to the pangenome graph. Additionally, tools like GraphChainer ( Ma et al 2023 ) and GraphAligner ( Rautiainen and Marschall 2020 ) offer advantages in aligning long-read data, demonstrating the tailored functionalities of these algorithms in addressing different aspects of pangenome analysis. Currently, three approaches are available for generating different types of pangenome graphs, including the genetic variation graph, reference-based pangenome graph, and reference-unbiased pangenome graph.…”
Section: Different Plant Pangenome Construction Pipelines and Strategiesmentioning
confidence: 99%
“…There are also several areas for improvement that are beyond the scope of this work. For instance, we have not implemented any seed filtering techniques such as colinear chaining ( Li et al 2020 ; Almodaresi et al 2021 ; Karasikov et al 2022 ; Chandra and Jain 2023 ; Ma et al 2023 ). These approaches use a cut-off threshold rather than the number of neighbors for anchoring, which can reduce the number of anchors in sparse areas of the sketch space and decrease the number of false matches.…”
Section: Future Study Of Mg-sketchmentioning
confidence: 99%
“…Because the time complexity of optimal sequence-to-graph alignment grows linearly with the number of edges in the graph ( Jain et al 2020 ; Gibney et al 2022 ), many approaches instead follow an approximate seed-and-extend strategy ( Altschul et al 1990 ), which operates in four main steps: (1) seed extraction , which in its simplest form involves finding all substrings with a certain length; (2) seed anchoring , finding matching nodes in the graph, (3) seed filtration , often involving clustering ( Chang et al 2020 ; Rautiainen and Marschall 2020 ) or colinear chaining ( Almodaresi et al 2021 ; Karasikov et al 2022 ; Chandra and Jain 2023 ; Ma et al 2023 ) of seeds, and (4) seed extension , involving performing semiglobal pairwise sequence alignment forward and backward from each anchored seed ( Li 2013 ). We will review the usage of exact seeds used in tools such as vg ( Garrison et al 2018 ) and GraphAligner (GA) ( Rautiainen and Marschall 2020 ) and discuss their limitations in a high mutation-rate setting.…”
mentioning
confidence: 99%
“…Extending alignment between sequences to sequence-to-graph alignment is an emerging and central challenge of computational pangenomics [ 12 ], as labeled graphs are a popular representation of pangenomes used in recent bioinformatics tools [ 13 16 ]. We assume that a labeled graph ( ) is the reference pangenome of interest.…”
Section: Introductionmentioning
confidence: 99%
“…To circumvent this difficulty, research efforts have concentrated on finding parameterized solutions to (exact) pattern matching in labeled graphs [ 19 – 22 ]. Moreover, the use of MEMs and co-linear chaining has also been extended to graphs [ 13 16 ].…”
Section: Introductionmentioning
confidence: 99%