CHOP: Haplotype-aware path indexing in population graphs

Mokveld, Tom; Linthorst, Jasper; Al-Ars, Zaid; Reinders, Marcel J. T.

doi:10.1101/305268

Cited by 9 publications

(8 citation statements)

References 32 publications

(20 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Also like FORGe, we showed that aligning to a super-population-matched major-allele reference did not substantially improve alignment accuracy compared to a global major-allele reference combining all super populations. Our results also reinforce that a linear aligner can be extended to incorporate variants and exhibit similar accuracy to a graph aligner [16,31].…”

Section: Discussionsupporting

confidence: 78%

“…This might be accomplished using unsupervised, sequence-driven clustering methods [34,35], using the "founder sequence" framework [36,37], or using some form of submodular optimization [38]. A more radical idea is to simply index all available individuals, forgoing the need to choose representatives; this is becoming more practical with the advent of new approaches for haplotype-aware path indexing [31] and efficient indexing for repetitive texts [39].…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Reference flow: reducing reference bias using multiple population genomes

et al. 2021

View full text Add to dashboard Cite

Most sequencing data analyses start by aligning sequencing reads to a linear reference genome, but failure to account for genetic variation leads to reference bias and confounding of results downstream. Other approaches replace the linear reference with structures like graphs that can include genetic variation, incurring major computational overhead. We propose the reference flow alignment method that uses multiple population reference genomes to improve alignment accuracy and reduce reference bias. Compared to the graph aligner vg, reference flow achieves a similar level of accuracy and bias avoidance but with 14% of the memory footprint and 5.5 times the speed.

show abstract

Section: Discussionsupporting

confidence: 78%

Section: Discussionmentioning

confidence: 99%

Reference flow: reducing reference bias using multiple population genomes

et al. 2021

View full text Add to dashboard Cite

show abstract

Section: Discussionsupporting

confidence: 78%

“…This might be accomplished using unsupervised, sequence-driven clustering methods 36,37 , using the "founder sequence" framework 38,39 , or using some form of submodular optimization 40 . A more radical idea is to simply index all available individuals, forgoing the need to choose representatives; this is becoming more practical with the advent of new approaches for haplotype-aware path indexing 33 and efficient indexing for repetitive texts 41 .…”

Section: Discussionmentioning

confidence: 99%

Reducing reference bias using multiple population reference genomes

Chen

Solomon

Mun

et al. 2020

Preprint

View full text Add to dashboard Cite

Most sequencing data analyses start by aligning sequencing reads to a linear reference genome. But failure to account for genetic variation causes reference bias and confounding of results downstream. Other approaches replace the linear reference with structures like graphs that can include genetic variation, incurring major computational overhead. We propose the "reference flow" alignment method that uses information from multiple population reference genomes to improve alignment accuracy and reduce reference bias. Compared to the graph aligner vg, reference flow exhibits a similar level of accuracy and bias avoidance, but with 13% of the memory footprint and 6 times the speed.

show abstract

“…Indexing variation graphs is challenging because the number of possible paths can be exponential in the number of variants encoded. Typical approaches to handle this problem are to index only some of the variation by limiting the indexed paths either heuristically [16,27,28] or by using panels of known haplotypes [29,30]. A recent method avoids the exponential blowup by dynamically indexing the graph and the reads, thereby exploiting that there can be exponentially many paths in the graphs, but not in the set of reads to be queried [31].…”

Section: Introductionmentioning

confidence: 99%

GraphAligner: Rapid and Versatile Sequence-to-Graph Alignment

Rautiainen

Marschall

2019

Preprint

View full text Add to dashboard Cite

Genome graphs can represent genetic variation and sequence uncertainty. Aligning sequences to genome graphs is key to many applications, including error correction, genome assembly, and genotyping of variants in a pan-genome graph. Yet, so far this step is often prohibitively slow. We present GraphAligner, a tool for aligning long reads to genome graphs. Compared to state-of-the-art tools, GraphAligner is 12x faster and uses 5x less memory, making it as efficient as aligning reads to linear reference genomes. When employing GraphAligner for error correction, we find it to be almost 3x more accurate and over 15x faster than extant tools.Availability: Package manager: https://anaconda.org/bioconda/graphaligner and source code: https://github.com/maickrau/GraphAligner

show abstract

CHOP: Haplotype-aware path indexing in population graphs

Cited by 9 publications

References 32 publications

Reference flow: reducing reference bias using multiple population genomes

Reference flow: reducing reference bias using multiple population genomes

Reducing reference bias using multiple population reference genomes

GraphAligner: Rapid and Versatile Sequence-to-Graph Alignment

Contact Info

Product

Resources

About