Over 100 years of studies in Drosophila melanogaster and related species in the genus Drosophila have facilitated key discoveries in genetics, genomics, and evolution. While high-quality genome assemblies exist for several species in this group, they only encompass a small fraction of the genus. Recent advances in long-read sequencing allow high-quality genome assemblies for tens or even hundreds of species to be efficiently generated. Here, we utilize Oxford Nanopore sequencing to build an open community resource of genome assemblies for 101 lines of 93 drosophilid species encompassing 14 species groups and 35 sub-groups. The genomes are highly contiguous and complete, with an average contig N50 of 10.5 Mb and greater than 97% BUSCO completeness in 97/101 assemblies. We show that Nanopore-based assemblies are highly accurate in coding regions, particularly with respect to coding insertions and deletions. These assemblies, along with a detailed laboratory protocol and assembly pipelines, are released as a public resource and will serve as a starting point for addressing broad questions of genetics, ecology, and evolution at the scale of hundreds of species.
Dosage-balance selection preserves functionally redundant duplicates (paralogs) at the optimum for their combined expression. Here we present a model of the dynamics of duplicate genes coevolving under dosage-balance selection. We call this the compensatory drift model. Results show that even when strong dosage-balance selection constrains total expression to the optimum, expression of each duplicate can diverge by drift from its original level. The rate of divergence slows as the strength of stabilizing selection, the size of the mutation effect, and/or the size of the population increases. We show that dosage-balance selection impedes neofunctionalization early after duplication but can later facilitate it. We fit this model to data from sodium channel duplicates in 10 families of teleost fish; these include two convergent lineages of electric fish in which one of the duplicates neofunctionalized. Using the model, we estimated the strength of dosage-balance selection for these genes. The results indicate that functionally redundant paralogs still may undergo radical functional changes after a prolonged period of compensatory drift.KEYWORDS duplication; expression evolution; dosage balance; neofunctionalization; whole-genome duplication T HE fate of duplicate genes is characterized by two extremes: degeneration and the origin of biological novelty. Early models for the evolutionary dynamics of duplicates suggested that typically one member of a duplicate pair would quickly degenerate into a nonfunctional pseudogene (Haldane 1933;Ohno 1970). More rarely, a duplicate instead may evolve a novel function in a process called neofunctionalization (Muller 1936;Ohno 1970;Ohta 1987). The time scale for either pseudogenization or neofunctionalization is expected to be on the order of a few million years (Lynch and Conery 2000).Recent research indicates, however, that the evolutionary dynamics for many duplicates are not so simple (Walsh 1995(Walsh , 2003Force et al. 1999;Papp et al. 2003;He and Zhang 2005;Rastogi and Liberles 2005;Scannell and Wolfe 2008;Qian et al. 2010;Kondrashov 2012). Some genes are dosage sensitive, meaning that a change in their copy number alters expression and disrupts the stoichiometric balance of their gene products with those of other genes. Duplicates of dosagesensitive genes typically will fix in a population only if they originate in a whole-genome duplication (WGD), where all interacting partners duplicate together. Selection to maintain the stoichiometric relations between the products of duplicate genes, termed dosage-balance selection, can preserve duplicates as functionally redundant copies for prolonged periods of time (Birchler et al. 2001(Birchler et al. , 2005Veitia 2002;Papp et al. 2003;Aury et al. 2006;Blomme et al. 2006;Freeling and Thomas 2006;Stranger et al. 2007;Qian and Zhang 2008;Edger and Pires 2009;Makino and McLysaght 2010;Konrad et al. 2011;Birchler and Veitia 2012;McGrath et al. 2014a).Recent data on a pair of sodium channel duplicates in teleost fish are cons...
Neuronal resting potential can tune the excitability of neural networks, affecting downstream behavior. Sodium leak channels (NALCN) play a key role in rhythmic behaviors by helping set, or subtly changing neuronal resting potential. The full complexity of these newly described channels is just beginning to be appreciated, however. NALCN channels can associate with numerous subunits in different tissues and can be activated by several different peptides and second messengers. We recently showed that NALCN channels are closely related to fungal calcium channels, which they functionally resemble. Here, we use this relationship to predict a family of NALCN-associated proteins in animals on the basis of homology with the yeast protein Mid1, the subunit of the yeast calcium channel. These proteins all share a cysteine-rich region that is necessary for Mid1 function in yeast. We validate this predicted association by showing that the Mid1 homolog in Drosophila, encoded by the CG33988 gene, is coordinately expressed with NALCN, and that knockdown of either protein creates identical phenotypes in several behaviors associated with NALCN function. The relationship between Mid1 and leak channels has therefore persisted over a billion years of evolution, despite drastic changes to both proteins and the organisms in which they exist.
Over 100 years of studies in Drosophila melanogaster and related species in the genus Drosophila have facilitated key discoveries in genetics, genomics, and evolution. While high-quality genome assemblies exist for several species in this group, they only encompass a small fraction of the genus. Recent advances in long read sequencing allow high quality genome assemblies for tens or even hundreds of species to be generated. Here, we utilize Oxford Nanopore sequencing to build an open community resource of high-quality assemblies for 101 lines of 95 drosophilid species encompassing 14 species groups and 35 sub-groups with an average contig N50 of 10.5 Mb and greater than 97% BUSCO completeness in 97/101 assemblies. These assemblies, along with detailed wet lab protocol and assembly pipelines, are released as a public resource and will serve as a starting point for addressing broad questions of genetics, ecology, and evolution within this key group.
Ion channels have played a substantial role in the evolution of novel traits across all of the domains of life. A fascinating example of a novel adaptation is the convergent evolution of electric organs in the Mormyroid and Gymnotiform electric fishes. The regulated currents that flow through ion channels directly generate the electrical signals which have evolved in these fish. Here, we investigated how the expression evolution of two sodium channel paralogs (Scn4aa and Scn4ab) influenced their convergent molecular evolution following the teleost-specific whole-genome duplication. We developed a reliable assay to accurately measure the expression stoichiometry of these genes and used this technique to analyze relative expression of the duplicate genes in a phylogenetic context. We found that before a major shift in expression from skeletal muscle and neofunctionalization in the muscle-derived electric organ, Scn4aa was first downregulated in the ancestors of both electric lineages. This indicates that underlying the convergent evolution of this gene, there was a greater propensity toward neofunctionalization due to its decreased expression relative to its paralog Scn4ab. We investigated another derived muscle tissue, the sonic organ of Porichthys notatus, and show that, as in the electric fishes, Scn4aa again shows a radical shift in expression away from the ancestral muscle cells into the evolutionarily novel muscle-derived tissue. This study presents evidence that expression downregulation facilitates neofunctionalization after gene duplication, a pattern that may often set the stage for novel trait evolution after gene duplication.
Transcriptomes are key to understanding the relationship between genotype and phenotype. The ability to infer the expression state (active or inactive) of genes in the transcriptome offers unique benefits for addressing this issue. For example, qualitative changes in gene expression may underly the origin of novel phenotypes, and expression states are readily comparable between tissues and species. However, inferring the expression state of genes is a surprisingly difficult problem, owing to the complex biological and technical processes that give rise to observed transcriptomic datasets. Here, we develop a hierarchical Bayesian mixture model that describes this complex process and allows us to infer expression state of genes from replicate transcriptomic libraries. We explore the statistical behavior of this method with analyses of simulated datasets—where we demonstrate its ability to correctly infer true (known) expression states—and empirical-benchmark datasets, where we demonstrate that the expression states inferred from RNA-sequencing (RNA-seq) datasets using our method are consistent with those based on independent evidence. The power of our method to correctly infer expression states is generally high and remarkably, approaches the maximum possible power for this inference problem. We present an empirical analysis of primate-brain transcriptomes, which identifies genes that have a unique expression state in humans. Our method is implemented in the freely available R package zigzag.
Transcriptomes are key to understanding the relationship between genotype and phenotype. The ability to infer the expression state (active or inactive) of genes in the transcriptome offers unique benefits for addressing this issue. For example, qualitative changes in gene expression may underly the origin of novel phenotypes, and expression states are readily comparable between tissues and species. However, inferring the expression state of genes is a surprisingly difficult problem, owing to the complex biological and technical processes that give rise to observed transcriptomic datasets. Here, we develop a hierarchical Bayesian mixture model that describes this complex process, and allows us to infer expression state of genes from replicate transcriptomic libraries. We explore the statistical behavior of this method with analyses of simulated datasets-where we demonstrate its ability to correctly infer true (known) expression states-and empirical-benchmark datasets, where we demonstrate that the expression states inferred from RNA-seq datasets using our method are consistent with those based on independent evidence. The power of our method to correctly infer expression states is generally high and, remarkably, approaches the maximum possible power for this inference problem. We present an empirical analysis of primate-brain transcriptomes, which identifies genes that have a unique expression state in humans. Our method is implemented in the freely-available R package zigzag.detecting zero transcripts of a given gene in a given tissue does 53 not necessarily indicate that it is inactive. Third, even when 54 we detect transcripts of a given gene, its measured expression 55 level is likely to vary among libraries owing to both biological 56 factors (e.g., population-level variation) and technical factors 57 (i.e., the relative abundance of a given transcript in a given 58 library depends on the total transcript number of that library). 59Therefore, the rank order in expression level of two genes in 60 one library may differ from their rank order in a second library, 61 which complicates methods that infer the expression state of 62 genes based on fixed expression-level thresholds (17, 21). 63Here, we present a hierarchical Bayesian model that de-64 scribes the biological and technical processes that generate 65 transcriptomic data that-by explicitly accommodating the 66 factors described above-allows us to infer the expression state 67 of each gene from replicate RNA-seq libraries. We present anal-68 yses of simulated datasets that validate the implementation 69 and characterize the statistical behavior of our hierarchical 70 Bayesian model. We also apply our method to several em-71 pirical datasets, and demonstrate that the expression states 72 inferred using our method are consistent with expectations 73 based on independent information, such as epigenetic marks 74 and developmental-genetic studies. Finally, we demonstrate 75 our method with an empirical analysis of primate-brain tran-76 scriptomes that identi...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.