2019
DOI: 10.1101/559583
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Haplotype-aware graph indexes

Abstract: Motivation:The variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are nonbiological, unlikely recombinations of true haplotypes. Results:We augment the VG model with haplotype information to identify which paths are more likely to exist in nature. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows-Wheeler transform (GBWT). We demonstrate the scalability of the new impleme… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
43
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 35 publications
(43 citation statements)
references
References 36 publications
0
43
0
Order By: Relevance
“…Indexing variation graphs is challenging because the number of possible paths can be exponential in the number of variants encoded. Typical approaches to handle this problem are to index only some of the variation by limiting the indexed paths either heuristically [16,27,28] or by using panels of known haplotypes [29,30]. A recent method avoids the exponential blowup by dynamically indexing the graph and the reads, thereby exploiting that there can be exponentially many paths in the graphs, but not in the set of reads to be queried [31].…”
Section: Introductionmentioning
confidence: 99%
“…Indexing variation graphs is challenging because the number of possible paths can be exponential in the number of variants encoded. Typical approaches to handle this problem are to index only some of the variation by limiting the indexed paths either heuristically [16,27,28] or by using panels of known haplotypes [29,30]. A recent method avoids the exponential blowup by dynamically indexing the graph and the reads, thereby exploiting that there can be exponentially many paths in the graphs, but not in the set of reads to be queried [31].…”
Section: Introductionmentioning
confidence: 99%
“…Nodes in the graph represent alleles at sites of variation and edges connect adjacent alleles. Once a variation-aware genome graph contains all alleles at known polymorphic sites, every haplotype can be represented as a walk through the graph [24]. However, an optimal balance between graph density and computational complexity is key to efficient whole-genome graph-based variant analysis because adding sites of variation to the graph incurs computational costs [18].…”
Section: Introductionmentioning
confidence: 99%
“…While the above heuristics 62 are also used in vg, they recently also proposed the use of haplotyping. In vg such haplotyping is 63 facilitated using the GBWT [17][18][19]. The GBWT is a graph extension of the positional Burrows-Wheeler 64 transform [20], that can store the haplotypes of samples as paths in the graph, allowing for haplotype 65 constrained read alignment.…”
mentioning
confidence: 99%