Adam M. Novak scite author profile

The human reference genome is part of the foundation of modern human biology and a monumental scientific achievement. However, because it excludes a great deal of common human variation, it introduces a pervasive reference bias into the field of human genomics. To reduce this bias, it makes sense to draw on representative collections of human genomes, brought together into reference cohorts. There are a number of techniques to represent and organize data gleaned from these cohorts, many using ideas implicitly or explicitly borrowed from graph-based models. Here, we survey various projects underway to build and apply these graph-based structures—which we collectively refer to as genome graphs—and discuss the improvements in read mapping, variant calling, and haplotype determination that genome graphs are expected to produce.

show abstract

Pangenomics enables genotyping of known structural variants in 5202 diverse genomes

Sirén

Monlong

Chang

et al. 2021

Science

178

209

View full text Add to dashboard Cite

Giraffe pangenomes Genomes within a species often have a core, conserved component, as well as a variable set of genetic material among individuals or populations that is referred to as a “pangenome.” Inference of the relationships between pangenomes sequenced with short-read technology is often done computationally by mapping the sequences to a reference genome. The computational method affects genome assembly and comparisons, especially in cases of structural variants that are longer than an average sequenced region, for highly polymorphic loci, and for cross-species analyses. Siren et al . present a bioinformatic method called Giraffe, which improves mapping pangenomes in polymorphic regions of the genome containing single nucleotide polymorphisms and structural variants with standard computational resources, making large-scale genomic analyses more accessible. —LMZ

show abstract

Genotyping structural variants in pangenome graphs using the vg toolkit

Hickey

Heller

Monlong

et al. 2019

Preprint

110

View full text Add to dashboard Cite

Structural variants (SVs) remain challenging to represent and study relative to point mutations despite their demonstrated importance. We show that variation graphs, as implemented in the vg toolkit, provide an e ective means for leveraging SV catalogs for short-read SV genotyping experiments. We benchmarked vg against state-of-the-art SV genotypers using three sequence-resolved SV catalogs generated by recent long-read sequencing studies. In addition, we use assemblies from 12 yeast strains to show that graphs constructed directly from aligned de novo assemblies improve genotyping compared to graphs built from intermediate SV catalogs in the VCF format.real Illumina reads and a pangenome built from SVs discovered in recent long-read sequencing studies [21,22,23,5], We also compared vg's performance with state-of-the-art SV genotypers: SVTyper[3], Delly Genotyper[4], BayesTyper[19], Paragraph[20] and . Across the datasets we tested, which range in size from 26k to 97k SVs, vg is the best performing SV genotyper on real short-read data for all SV types in the majority of cases. Finally, we demonstrate that a pangenome graph built from the alignment of de novo assemblies of diverse Saccharomyces cerevisiae strains improves SV genotyping performance. Results Structural variation in vgWe used vg to implement a straightforward SV genotyping pipeline. Reads are mapped to the graph and used to compute the read support for each node and edge (see Supplementary Information for a description of the graph formalism). Sites of variation within the graph are then identi ed using the snarl decomposition as described in [24]. These sites correspond to intervals along the reference paths (ex. contigs or chromosomes) which are embedded in the graph. They also contain nodes and edges deviating from the reference path, which represent variation at the site. For each site, the two most supported paths spanning its interval (haplotypes) are determined, and their relative supports used to produce a genotype at that site (Figure 1a). The pipeline is described in detail in Methods. We rigorously evaluated the accuracy of our method on a variety of datasets, and present these results in the remainder of this section.

show abstract

StatAlign: an extendable software package for joint Bayesian estimation of alignments and evolutionary trees

Novak

Miklós

Lyngsø

et al. 2008

View full text Add to dashboard Cite

We have developed an extendable software package in the Java programming language that samples from the joint posterior distribution of phylogenies, alignments and evolutionary parameters by applying the Markov chain Monte Carlo method. The package also offers tools for efficient on-the-fly summarization of the results. It has a graphical interface to configure, start and supervise the analysis, to track the status of the Markov chain and to save the results. The background model for insertions and deletions can be combined with any substitution model. It is easy to add new substitution models to the software package as plugins. The samples from the Markov chain can be summarized in several ways, and new postprocessing plugins may also be installed.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Adam M. Novak

The Human Pangenome Project: a global resource to map genomic diversity

Genome graphs and the evolution of genome inference

Pangenomics enables genotyping of known structural variants in 5202 diverse genomes

Genotyping structural variants in pangenome graphs using the vg toolkit

StatAlign: an extendable software package for joint Bayesian estimation of alignments and evolutionary trees

Contact Info

Product

Resources

About