Efficient dynamic variation graphs

Eizenga, Jordan M.; Novak, Adam M.; Kobayashi, Emily; Villani, Flavia; Cisar, Cecilia; Heumos, Simon; Hickey, Glenn; Colonna, Vincenza; Paten, Benedict; Garrison, Erik

doi:10.1093/bioinformatics/btaa640

Cited by 28 publications

(27 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Following the generation of alignments with CACTUS, we used a custom pipeline to detect nodes that were not present in the Hereford genome, ARS-UCD1.2, considered as the reference genome. We first used a custom python script and the libbdsg 54 library to extract the nodes not present in any Hereford paths. These nodes have then been screened for Nmers, and then misassembled regions detected by FRC_Align 30 on the two de novo assemblies here presented were discarded.…”

Section: Genome Alignment and Comparisonmentioning

confidence: 99%

“…We generated a linear expanded genome with the purpose of providing an easy to use, expanded version of the cattle reference genome that is also easy to implement in current best practice pipelines. We extracted all nodes not present in the linear Hereford genome, but that were found in the other 4 assemblies considered using libbdsg (v0.3) 54 . Nodes were then labelled based on the genome in which they were found (i.e.…”

Section: Linear Expanded Genomementioning

confidence: 99%

See 1 more Smart Citation

A cattle graph genome incorporating global breed diversity

Talenti

Powell

Hemmink

et al. 2021

Preprint

View full text Add to dashboard Cite

Despite only 8% of cattle being found in Europe, European breeds dominate current genetic resources. This adversely impacts cattle research in other important global cattle breeds. To mitigate this issue, we have generated the first assemblies of African breeds, which have been integrated with genomic data for 294 diverse cattle into the first graph genome that incorporates global cattle diversity. We illustrate how this more representative reference assembly contains an extra 116.1Mb (4.2%) of sequence absent from the current Hereford sequence and consequently inaccessible to current studies. We further demonstrate how using this graph genome increases read mapping rates, reduces allelic biases and improves the agreement of structural variant calling with independent optical mapping data. Consequently, we present an improved, more representative, reference assembly that will improve global cattle research.

show abstract

Section: Genome Alignment and Comparisonmentioning

confidence: 99%

Section: Linear Expanded Genomementioning

confidence: 99%

A cattle graph genome incorporating global breed diversity

Talenti

Powell

Hemmink

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…3.Their embedded paths are locally similar to each other. These properties are used to build efficient dynamic variation graph data structures (Siren et al, 2020;Eizenga et al, 2020a). Sparsity (1) allows us to encode edges E using adjacency lists rather than matrices or hash tables.…”

Section: Methodsmentioning

confidence: 99%

ODGI: understanding pangenome graphs

Heumos

Nahnsen

Prins

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

MotivationPangenome graphs provide a complete representation of the mutual alignment of collections of genomes. These models offer the opportunity to study the entire genomic diversity of a population, including structurally complex regions. Nevertheless, analyzing hundreds of gigabase-scale genomes using pangenome graphs is difficult as it is not well-supported by existing tools. Hence, fast and versatile software is required to ask advanced questions to such data in an efficient way.ResultsWe wrote ODGI, a novel suite of tools that implements scalable algorithms and has an efficient in-memory representation of DNA variation graphs. ODGI includes tools for detecting complex regions, extracting loci, removing artifacts, exploratory analysis, manipulation, validation, and visualization. Its fast parallel execution facilitates routine pangenomic tasks, as well as pipelines that can quickly answer complex biological questions of gigabase-scale pangenome graphs.AvailabilityODGI is published as free software under the MIT open source license. Source code can be downloaded from https://github.com/pangenome/odgi and documentation is available at https://odgi.readthedocs.io. ODGI can be installed via Bioconda https://bioconda.github.io/recipes/odgi/README.html or GNU Guix https://github.com/ekg/guix-genomics/blob/master/odgi.scm.Contactegarris5@uthsc.edu

show abstract

“…This is not always identical to the original sequence graph, as some nodes and edges may not be visited by any haplotype. In order to support the handle graph interface 43 , we need some additional structures:…”

Section: /31mentioning

confidence: 99%

Genotyping common, large structural variations in 5,202 genomes using pangenomes, the Giraffe mapper, and the vg toolkit

Sirén

Monlong

Chang

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

We introduce Giraffe, a pangenome short read mapper that can efficiently map to a collection of haplotypes threaded through a sequence graph. Giraffe, part of the variation graph toolkit (vg), maps reads to thousands of human genomes at around the same speed BWA-MEM maps reads to a single reference genome, while maintaining comparable accuracy to VG-MAP, vg's original mapper. We have developed efficient genotyping pipelines using Giraffe. We demonstrate improvements in genotyping for single nucleotide variations (SNVs), insertions and deletions (indels) and structural variations (SVs) genome-wide. We use Giraffe to genotype and phase 167 thousands structural variations ascertained from long read studies in 5,202 human genomes sequenced with short reads, including the complete 1000 Genomes Project dataset, at an average cost of $1.50 per sample. We determine the frequency of these variations in diverse human populations, characterize their complex allelic variations and identify thousands of expression quantitative trait loci (eQTLs) driven by these variations.

show abstract

Efficient dynamic variation graphs

Cited by 28 publications

References 7 publications

A cattle graph genome incorporating global breed diversity

A cattle graph genome incorporating global breed diversity

ODGI: understanding pangenome graphs

Genotyping common, large structural variations in 5,202 genomes using pangenomes, the Giraffe mapper, and the vg toolkit

Contact Info

Product

Resources

About