Reference collections of multiple Drosophila lines with accumulating collections of “omics” data have proven especially valuable for the study of population genetics and complex trait genetics. Here we present a description of a resource collection of 84 strains of Drosophila melanogaster whose genome sequences were obtained after 12 generations of full-sib inbreeding. The initial rationale for this resource was to foster development of a systems biology platform for modeling metabolic regulation by the use of natural polymorphisms as perturbations. As reference lines, they are amenable to repeated phenotypic measurements, and already a large collection of metabolic traits have been assayed. Another key feature of these strains is their widespread geographic origin, coming from Beijing, Ithaca, Netherlands, Tasmania, and Zimbabwe. After obtaining 12.5× coverage of paired-end Illumina sequence reads, SNP and indel calls were made with the GATK platform. Thorough quality control was enabled by deep sequencing one line to >100×, and single-nucleotide polymorphisms and indels were validated using ddRAD-sequencing as an orthogonal platform. In addition, a series of preliminary population genetic tests were performed with these single-nucleotide polymorphism data for assessment of data quality. We found 83 segregating inversions among the lines, and as expected these were especially abundant in the African sample. We anticipate that this will make a useful addition to the set of reference D. melanogaster strains, thanks to its geographic structuring and unusually high level of genetic diversity.
The human face is complex and multipartite, and characterization of its genetic architecture remains challenging. Using a multivariate genome-wide association study (GWAS) meta-analysis of 8,246 European individuals, we identified 203 genome-wide significant signals (120 also study-wide significant) associated with normal-range facial variation. Follow-up analyses find that the regions surrounding these signals are enriched for enhancer activity in cranial neural crest cells and craniofacial tissues, several regions harbor multiple signals with associations to different facial phenotypes, and there is evidence for potential coordinated actions of variants. In sum, our analyses provide insights for understanding how complex morphological traits are shaped by both individual and coordinated genetic actions.
Complete genome sequences contain valuable information about natural selection, but this information is difficult to access for short, widely scattered noncoding elements such as transcription factor binding sites or small noncoding RNAs. Here, we introduce a new computational method, called Inference of Natural Selection from Interspersed Genomically coHerent elemenTs (INSIGHT), for measuring the influence of natural selection on such elements. INSIGHT uses a generative probabilistic model to contrast patterns of polymorphism and divergence in the elements of interest with those in flanking neutral sites, pooling weak information from many short elements in a manner that accounts for variation among loci in mutation rates and coalescent times. The method is able to disentangle the contributions of weak negative, strong negative, and positive selection based on their distinct effects on patterns of polymorphism and divergence. It obtains information about divergence from multiple outgroup genomes using a general statistical phylogenetic approach. The INSIGHT model is efficiently fitted to genome-wide data using an approximate expectation maximization algorithm. Using simulations, we show that the method can accurately estimate the parameters of interest even in complex demographic scenarios, and that it significantly improves on methods based on summary statistics describing polymorphism and divergence. To demonstrate the usefulness of INSIGHT, we apply it to several classes of human noncoding RNAs and to GATA2-binding sites in the human genome.
Mirtrons are microRNA (miRNA) substrates that utilize the splicing machinery to bypass the necessity of Drosha cleavage for their biogenesis. Expanding our recent efforts for mammalian mirtron annotation, we use meta-analysis of aggregate datasets to identify ~500 novel mouse and human introns that confidently generate diced small RNA duplexes. These comprise nearly 1000 total loci distributed in four splicing-mediated biogenesis subclasses, with 5'-tailed mirtrons as, by far, the dominant subtype. Thus, mirtrons surprisingly comprise a substantial fraction of endogenous Dicer substrates in mammalian genomes. Although mirtron-derived small RNAs exhibit overall expression correlation with their host mRNAs, we observe a subset with substantial differences that suggest regulated processing or accumulation. We identify characteristic sequence, length, and structural features of mirtron loci that distinguish them from bulk introns, and find that mirtrons preferentially emerge from genes with larger numbers of introns. While mirtrons generate miRNA-class regulatory RNAs, we also find that mirtrons exhibit many features that distinguish them from canonical miRNAs. We observe that conventional mirtron hairpins are substantially longer than Drosha-generated pre-miRNAs, indicating that the characteristic length of canonical pre-miRNAs is not a general feature of Dicer substrate hairpins. In addition, mammalian mirtrons exhibit unique patterns of ordered 5' and 3' heterogeneity, which reveal hidden complexity in miRNA processing pathways. These include broad 3'-uridylation of mirtron hairpins, atypically heterogeneous 5' termini that may result from exonucleolytic processing, and occasionally robust decapitation of the 5' guanine (G) of mirtron-5p species defined by splicing. Altogether, this study reveals that this extensive class of non-canonical miRNA bears a multitude of characteristic properties, many of which raise general mechanistic questions regarding the processing of endogenous hairpin transcripts.
We expanded the knowledge base for Drosophila cell line transcriptomes by deeply sequencing their small RNAs. In total, we analyzed more than 1 billion raw reads from 53 libraries across 25 cell lines. We verify reproducibility of biological replicate data sets, determine common and distinct aspects of miRNA expression across cell lines, and infer the global impact of miRNAs on cell line transcriptomes. We next characterize their commonalities and differences in endo-siRNA populations. Interestingly, most cell lines exhibit enhanced TE-siRNA production relative to tissues, suggesting this as a common aspect of cell immortalization. We also broadly extend annotations of cis-NAT-siRNA loci, identifying ones with common expression across diverse cells and tissues, as well as cell-restricted loci. Finally, we characterize small RNAs in a set of ovary-derived cell lines, including somatic cells (OSS and OSC) and a mixed germline/somatic cell population (fGS/ OSS) that exhibits ping-pong piRNA signatures. Collectively, the ovary data reveal new genic piRNA loci, including unusual configurations of piRNA-generating regions. Together with the companion analysis of mRNAs described in a previous study, these small RNA data provide comprehensive information on the transcriptional landscape of diverse Drosophila cell lines. These data should encourage broader usage of fly cell lines, beyond the few that are presently in common usage.
Relatively little is known about the in vivo functions of newly emerging genes, especially in metazoans. Although prior RNAi studies reported prevalent lethality among young gene knockdowns, our phylogenomic analyses reveal that young genes are frequently restricted to the nonessential male reproductive system. We performed large-scale CRISPR/Cas9 mutagenesis of "conserved, essential" and "young, RNAi-lethal" genes and broadly confirmed the lethality of the former but the viability of the latter. Nevertheless, certain young gene mutants exhibit defective spermatogenesis and/or male sterility. Moreover, we detected widespread signatures of positive selection on young male-biased genes. Thus, young genes have a preferential impact on male reproductive system function.
The propensity of animal miRNAs to regulate targets bearing modest complementarity, most notably via pairing with miRNA positions ∼2-8 (the "seed"), is believed to drive major aspects of miRNA evolution. First, minimal targeting requirements have allowed most conserved miRNAs to acquire large target cohorts, thus imposing strong selection on miRNAs to maintain their seed sequences. Second, the modest pairing needed for repression suggests that evolutionarily nascent miRNAs may generally induce net detrimental, rather than beneficial, regulatory effects. Hence, levels and activities of newly emerged miRNAs are expected to be limited to preserve the status quo of gene expression. In this study, we unexpectedly show that Drosophila testes specifically express a substantial miRNA population that contravenes these tenets. We find that multiple genomic clusters of testis-restricted miRNAs harbor recently evolved miRNAs, whose experimentally verified orthologs exhibit divergent sequences, even within seed regions. Moreover, this class of miRNAs exhibits higher expression and greater phenotypic capacities in transgenic misexpression assays than do non-testis-restricted miRNAs of similar evolutionary age. These observations suggest that these testis-restricted miRNAs may be evolving adaptively, and several methods of evolutionary analysis provide strong support for this notion. Consistent with this, proof-of-principle tests show that orthologous miRNAs with divergent seeds can distinguish target sensors in a species-cognate manner. Finally, we observe that testis-restricted miRNA clusters exhibit extraordinary dynamics of miRNA gene flux in other Drosophila species. Altogether, our findings reveal a surprising tissue-directed influence of miRNA evolution, involving a distinct mode of miRNA function connected to adaptive gene regulation in the testis.
The human face is complex and multipartite, and characterization of its genetic architecture remains intriguingly challenging. Applying GWAS to multivariate shape 2 phenotypes, we identified 203 genomic regions associated with normal-range facial variation, 117 of which are novel. The associated regions are enriched for both genes relevant to 4 craniofacial and limb morphogenesis and enhancer activity in cranial neural crest cells and craniofacial tissues. Genetic variants grouped by their contribution to similar aspects of facial 6 variation show high within-group correlation of enhancer activity, and four SNP pairs display evidence of epistasis, indicating potentially coordinated actions of variants within the same cell 8 types or tissues. In sum, our analyses provide new insights for understanding how complex morphological traits are shaped by both individual and coordinated genetic actions. 10 Main Text: 12 "One of the major problems confronting modern biology is to understand how complex morphological structures arise during development and how they are altered during evolution""complicated developmental choreography" in which intrinsic genetic factors, epigenetic factors, and interactions between the two make up the progeny genotype, which engages with the 20 environment to ultimately produce a complex morphological trait, defined thus by its composition from a number of separate component parts 1 . We now understand that the intrinsic 22 genetic factors ultimately contributing to complex morphological traits consist not only of single 2 variants altering protein structure and/or function, but also non-coding variants and interactions 24 among variants, each affecting multiple tissues and developmental timepoints. This realization necessitates the development and utilization of methods capable of describing the genetic 26 architecture of complex morphological traits, which includes identifying the individual genetic variants contributing to morphological variation as well as their interactions 2,3 . 28 The human face is an exemplar complex morphological structure. It is a highly multipartite structure resulting from the intricate coordination of genetic, cellular, and 30 environmental factors 4-6 . Through prior genetic association studies of quantitative traits, 51 loci have been implicated in normal-range craniofacial morphology, and an additional 50 loci have 32 been associated with self-reported nose size or chin dimples in a large cohort study 7 (Table S1).However, as with all complex morphological traits, our ability to identify and describe the 34 genetic architecture of the face is limited by our ability to accurately characterize its phenotypic variation 4 , identify variants of both large and small effect 8 , and identify interactions between 36 variants. We previously described a novel data-driven approach to facial phenotyping, which facilitated the identification and replication of 15 loci involved in global-to-local variation in 38 facial morphology 9 . Here, we apply this phenotyping approach...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.