High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1–4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are only available for a few non-microbial species 1-4 . To address this issue, the international Genome 10K (G10K) consortium 5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling the most accurate and complete reference genomes to date. Here we summarize these developments, introduce a set of quality standards, and present lessons learned from sequencing and assembling 16 species representing major vertebrate lineages (mammals, birds, reptiles, amphibians, teleost fishes and cartilaginous fishes). We confirm that long-read sequencing technologies are essential for maximizing genome quality and that unresolved complex repeats and haplotype heterozygosity are major sources of error in assemblies. Our new assemblies identify and correct substantial errors in some of the best historical reference genomes. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an effort to generate high-quality, complete reference genomes for all ~70,000 extant vertebrate species and help enable a new era of discovery across the life sciences.
a b s t r a c tPhylogenetic incongruence can be caused by analytical shortcomings or can be the result of biological processes, such as hybridization, incomplete lineage sorting and gene duplication. Differentiation between these causes of incongruence is essential to unravel complex speciation and diversification events. The phylogeny of the True Geese (tribe Anserini, Anatidae, Anseriformes) was, until now, con tentious, i.e., the phylogenetic relationships and the timing of divergence between the different goose species could not be fully resolved. We sequenced nineteen goose genomes (representing seventeen spe cies of which three subspecies of the Brent Goose, Branta bernicla) and used an exon based phylogenomic approach (41,736 exons, representing 5887 genes) to unravel the evolutionary history of this bird group. We thereby provide general guidance on the combination of whole genome evolutionary analyses and analytical tools for such cases where previous attempts to resolve the phylogenetic history of several taxa could not be unravelled. Identical topologies were obtained using either a concatenation (based upon an alignment of 6,630,626 base pairs) or a coalescent based consensus method. Two major lineages, corre sponding to the genera Anser and Branta, were strongly supported. Within the Branta lineage, the White cheeked Geese form a well supported sub lineage that is sister to the Red breasted Goose (Branta ruficol lis). In addition, two main clades of Anser species could be identified, the White Geese and the Grey Geese. The results from the consensus method suggest that the diversification of the genus Anser is heavily influ enced by rapid speciation and by hybridization, which may explain the failure of previous studies to resolve the phylogenetic relationships within this genus. The majority of speciation events took place in the late Pliocene and early Pleistocene (between 4 and 2 million years ago), conceivably driven by a global cooling trend that led to the establishment of a circumpolar tundra belt and the emergence of tem perate grasslands. Our approach will be a fruitful strategy for resolving many other complex evolutionary histories at the level of genera, species, and subspecies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.