2020
DOI: 10.1101/2020.12.04.412486
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Genotyping common, large structural variations in 5,202 genomes using pangenomes, the Giraffe mapper, and the vg toolkit

Abstract: We introduce Giraffe, a pangenome short read mapper that can efficiently map to a collection of haplotypes threaded through a sequence graph. Giraffe, part of the variation graph toolkit (vg), maps reads to thousands of human genomes at around the same speed BWA-MEM maps reads to a single reference genome, while maintaining comparable accuracy to VG-MAP, vg's original mapper. We have developed efficient genotyping pipelines using Giraffe. We demonstrate improvements in genotyping for single nucleotide variatio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
32
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 32 publications
(40 citation statements)
references
References 61 publications
(125 reference statements)
0
32
0
Order By: Relevance
“…Our workflow provides tools to determine the origin of non-reference bases, derive structural variations from multi-assembly graphs, predict putatively novel genes and append the novel sequences linearly to a reference genome. We anticipate that the latter will become obsolete as soon as accurate and fast base-level alignment and split-read graph mapping enables the full-suite of genome analyses from a reference graph 48 .…”
Section: Discussionmentioning
confidence: 99%
“…Our workflow provides tools to determine the origin of non-reference bases, derive structural variations from multi-assembly graphs, predict putatively novel genes and append the novel sequences linearly to a reference genome. We anticipate that the latter will become obsolete as soon as accurate and fast base-level alignment and split-read graph mapping enables the full-suite of genome analyses from a reference graph 48 .…”
Section: Discussionmentioning
confidence: 99%
“…Short reads are aligned to the graph along the path of best fit, facilitating genotyping even in structurally complex and repetitive regions. Informed by a large catalog of candidate SV alleles discovered by long-read sequencing, graph genotyping thus permits the study of variants that would be difficult or impossible to discover with short-read data alone (Sibbesen et al, 2018;Chen et al, 2019;Hickey et al, 2020;Sirén et al, 2020).…”
Section: Graph Genotyping Of Structural Variationmentioning
confidence: 99%
“…Note that for , this use case is not directly supported, as it is designed to genotype structural variants only -we include the results anyway as this outperformed . We also note that during the finalisation of this paper, a new caller based on (named [8]) was released, which we have not tested here. Third, we show how locally defined alternate references allow accessing small variants on top of diverged forms of a dimorphic gene in P. falciparum .…”
Section: Discussionmentioning
confidence: 99%
“…There are data structures that in principle can genotype alternate alleles which include both long structural variants and SNPs -some implementations include , , , [47]. All of these are based on graph representations of one form or another ranging from genotyping a whole-genome de Bruijn graph (), mapping all reads to a whole-genome Directed Acyclic Graph (DAG) of informative k-mers (), mapping all reads to a wholegenome graph of minimizing k-mers and matched haplotype index (/ [8]) or remapping premapped reads either to local DAGs of SNPs and indels off the reference (), or to graphs built from structural variant breakpoints ( [9]). These all reduce the impact of reference bias, and allow cohort genotyping at consistent sites, but all of them struggle with the issue of representation.…”
Section: Introductionmentioning
confidence: 99%