2017
DOI: 10.1101/234856
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Sequence variation aware genome references and read mapping with the variation graph toolkit

Abstract: Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references are fundamentally limited in that they represent only one version of each locus, whereas the population may contain multiple variants. When the reference represents an individual's genome poorly, it can impact read mapping and introduce bias.Variation graphs are bidirected DNA sequence graphs that compactly represent genetic variation, including large scale structural variation such as inversions and duplic… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
29
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
3
3
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 24 publications
(29 citation statements)
references
References 34 publications
0
29
0
Order By: Relevance
“…While methods for including variants in the reference are growing in number [6,7,8,9,10,11,12], there is little or no work on how to choose which variants to include. Past studies have made such decisions in ad hoc ways, with some filtering according to allele frequency [13,8], ethnicity [7], or both [9].…”
Section: Introductionmentioning
confidence: 99%
“…While methods for including variants in the reference are growing in number [6,7,8,9,10,11,12], there is little or no work on how to choose which variants to include. Past studies have made such decisions in ad hoc ways, with some filtering according to allele frequency [13,8], ethnicity [7], or both [9].…”
Section: Introductionmentioning
confidence: 99%
“…Given a flat reference genome, and a set of sequence resolved alleles in a VCF/BCF format, we can create a reference variant graph (RVG). This RVG is an in-memory extension of a given flat reference with paths (branches) that diverge from the reference for each of these variants, creating a graph representation similar to that described in depth in [ 12 ]. Edges of the graph represent inter-base reference locations between alleles, and nodes represent nucleotide sequences.…”
Section: Biograph Coveragementioning
confidence: 99%
“…To test our peak caller, we used vg [5] to create a whole genome Arabidopsis thaliana reference graph by using variants from The 1001 Genomes Project. We selected all transcription factors listed in the transcription factor database of Expresso [10] that also had a motif in the Jaspar database of transcription factor binding profiles [11], resulting in a set of 5 transcription factors: ERF115, SEP3, AP1, SOC1, and PI.…”
Section: Validation and Testingmentioning
confidence: 99%
“…Graph-based reference genomes offers a way to include known variants in the reference structure [3]. The software package vg supports mapping reads to a graph-based reference genome with potentially increased accuracy [4,5] as compared to mapping reads to a standard linear reference genome using tools like BWA-MEM [6] or Bowtie [7]. Several types of genomic analyses, such as variant calling and haplotyping, can now be performed using graph-based references [4,5].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation