Constantina Theofanopoulou scite author profile

High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1–4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.

show abstract

Towards complete and error-free genome assemblies of all vertebrate species

Rhie

McCarthy

Fédrigo

et al. 2020

Preprint

247

464

View full text Add to dashboard Cite

High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are only available for a few non-microbial species 1-4 . To address this issue, the international Genome 10K (G10K) consortium 5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling the most accurate and complete reference genomes to date. Here we summarize these developments, introduce a set of quality standards, and present lessons learned from sequencing and assembling 16 species representing major vertebrate lineages (mammals, birds, reptiles, amphibians, teleost fishes and cartilaginous fishes). We confirm that long-read sequencing technologies are essential for maximizing genome quality and that unresolved complex repeats and haplotype heterozygosity are major sources of error in assemblies. Our new assemblies identify and correct substantial errors in some of the best historical reference genomes. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an effort to generate high-quality, complete reference genomes for all ~70,000 extant vertebrate species and help enable a new era of discovery across the life sciences.

show abstract

Self-domestication in Homo sapiens: Insights from comparative genomics

et al. 2017

View full text Add to dashboard Cite

This study identifies and analyzes statistically significant overlaps between selective sweep screens in anatomically modern humans and several domesticated species. The results obtained suggest that (paleo-)genomic data can be exploited to complement the fossil record and support the idea of self-domestication in Homo sapiens, a process that likely intensified as our species populated its niche. Our analysis lends support to attempts to capture the “domestication syndrome” in terms of alterations to certain signaling pathways and cell lineages, such as the neural crest.

show abstract

The era of reference genomes in conservation genomics

Formenti

Theissinger

Fernandes

et al. 2022

Trends in Ecology & Evolution

159

View full text Add to dashboard Cite

International initiatives aimed at generating genomic resources, and particularly reference genomes, have flourished in recent years. Some focus on specific taxa, such as the Vertebrate Genomes Project, Bird Genome 10K Project, Bat1K Project, Global Invertebrate Genomics Alliance, 10 000 Plant Genomes Project, and 1000 Fungal Genomes project. Others focus on geographic regions, such as the California Conservation Genomics Project, Darwin Tree of Life for Britain and Ireland, Catalan Initiative for the Earth BioGenome Project in the Catalan territories, Endemixit in Italy, Norwegian Earth Biogenome Project, and SciLifeLab in Sweden, on applications such as the LOEWE Translational Biodiversity Genomics in Germany, or on ecological systems such as the Aquatic Symbiosis Genomics project. Collectively part of the Earth BioGenome Project (EBP), in Europe these initiatives are organized under the umbrella of the European Reference Genome Atlas (ERGA). A genome atlas of European biodiversityERGA is a pan-European scientific response to the current threats to biodiversity. Approximately one fifth of the ~200 000 eukaryotic species present in Europe can be inferred to be at risk of extinction according to the International Union for Conservation of Nature (IUCN) Red List classification (this estimate only considers the assessed species; https://www.iucn.org/regions/europe/our-work/biodiversity-conservation/european-red-list-threatened-species).ERGA aims to generate reference genomes of European eukaryotic species across the tree of life, including threatened, endemic, and keystone species, as well as pests and species important to agriculture, fisheries, and ecosystem function and stability. ERGA builds upon current genomic consortia in EU member states, EU Associated Countries, representatives of other countries within the European bioregion, and international collaborators. These reference genomes will address fundamental and applied questions in conservation, biology, and health. ERGA seeks to alert the EU about the potential of conservation genomics, and particularly the role of reference genomes, in biodiversity assessment, conservation strategies, and restoration efforts.

show abstract

Universal nomenclature for oxytocin–vasotocin ligand and receptor families

Theofanopoulou

Gedman

Cahill

et al. 2021

Nature

View full text Add to dashboard Cite

Oxytocin (OXT; hereafter OT) and arginine vasopressin or vasotocin (AVP or VT; hereafter VT) are neurotransmitter ligands that function through specific receptors to control diverse functions1,2. Here we performed genomic analyses on 35 species that span all major vertebrate lineages, including newly generated high-contiguity assemblies from the Vertebrate Genomes Project3,4. Our findings support the claim5 that OT (also known as OXT) and VT (also known as AVP) are adjacent paralogous genes that have resulted from a local duplication, which we infer was through DNA transposable elements near the origin of vertebrates and in which VT retained more of the parental sequence. We identified six major oxytocin–vasotocin receptors among vertebrates. We propose that all six of these receptors arose from a single receptor that was shared with the common ancestor of invertebrates, through a combination of whole-genome and large segmental duplications. We propose a universal nomenclature based on evolutionary relationships for the genes that encode these receptors, in which the genes are given the same orthologous names across vertebrates and paralogous names relative to each other. This nomenclature avoids confusion due to differential naming in the pre-genomic era and incomplete genome assemblies, furthers our understanding of the evolution of these genes, aids in the translation of findings across species and serves as a model for other gene families.

show abstract

A hypothesis on a role of oxytocin in the social mechanisms of speech and vocal learning

2017

View full text Add to dashboard Cite

Language acquisition in humans and song learning in songbirds naturally happen as a social learning experience, providing an excellent opportunity to reveal social motivation and reward mechanisms that boost sensorimotor learning. Our knowledge about the molecules and circuits that control these social mechanisms for vocal learning and language is limited. Here we propose a hypothesis of a role for oxytocin (OT) in the social motivation and evolution of vocal learning and language. Building upon existing evidence, we suggest specific neural pathways and mechanisms through which OT might modulate vocal learning circuits in specific developmental stages.

show abstract

Globularization and Domestication

Benítez‐Burraco

Theofanopoulou

Boeckx

2016

Topoi

View full text Add to dashboard Cite

This paper aims to explore a potential connection between two hypotheses recently put forward in the context of language evolution. One hypothesis argues that some human-specific change(s) in the hominin brain developmental program habilitated the neuronal workspace that enabled ''cognitive modernity'' to unfold, also resulting in our globularized braincase. The other argues that the cultural niche resulting from our self-domestication favored the emergence of natural languages. In this article we document numerous links between the genetic changes we have claimed may have brought about globularization and neural crest cells, which have been claimed to explain the constellation of distinctive traits (physical, cognitive, and behavioral) found in all domesticated mammals. If these links turn out to be as robust as we think they are, globularization and self-domestication may well be closely related phenomena in the context of human evolution.

show abstract

How genomics can help biodiversity conservation

Theissinger¹,

Fernandes²,

Formenti³

et al. 2023

Trends in Genetics

View full text Add to dashboard Cite

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.