Drosophila melanogaster is one of the most well studied genetic model organisms, nonetheless its genome still contains unannotated coding and non-coding genes, transcripts, exons, and RNA editing sites. Full discovery and annotation are prerequisites for understanding how the regulation of transcription, splicing, and RNA editing directs development of this complex organism. We used RNA-Seq, tiling microarrays, and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events and inferred protein isoforms that previously eluded discovery using established experimental, prediction and conservation-based approaches. Together, these data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development.
Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.
To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.
In fruit fly research, chromosomal deletions are indispensable tools for mapping mutations, characterizing alleles and identifying interacting loci. Most widely used deletions were generated by irradiation or chemical mutagenesis. These methods are labor-intensive, generate random breakpoints and result in unwanted secondary mutations that can confound phenotypic analyses. Most of the existing deletions are large, have molecularly undefined endpoints and are maintained in genetically complex stocks. Furthermore, the existence of haplolethal or haplosterile loci makes the recovery of deletions of certain regions exceedingly difficult by traditional methods, resulting in gaps in coverage. Here we describe two methods that address these problems by providing for the systematic isolation of targeted deletions in the D. melanogaster genome. The first strategy used a P element-based technique to generate deletions that closely flank haploinsufficient genes and minimize undeleted regions. This deletion set has increased overall genomic coverage by 5-7%. The second strategy used FLP recombinase and the large array of FRT-bearing insertions described in the accompanying paper to generate 519 isogenic deletions with molecularly defined endpoints. This second deletion collection provides 56% genome coverage so far. The latter methodology enables the generation of small custom deletions with predictable endpoints throughout the genome and should make their isolation a simple and routine task.
Animal transcriptomes are dynamic, each cell type, tissue and organ system expressing an ensemble of transcript isoforms that give rise to substantial diversity. We identified new genes, transcripts, and proteins using poly(A)+ RNA sequence from Drosophila melanogaster cultured cell lines, dissected organ systems, and environmental perturbations. We found a small set of mostly neural-specific genes has the potential to encode thousands of transcripts each through extensive alternative promoter usage and RNA splicing. The magnitudes of splicing changes are larger between tissues than between developmental stages, and most sex-specific splicing is gonad-specific. Gonads express hundreds of previously unknown coding and long noncoding RNAs (lncRNAs) some of which are antisense to protein-coding genes and produce short regulatory RNAs. Furthermore, previously identified pervasive intergenic transcription occurs primarily within newly identified introns. The fly transcriptome is substantially more complex than previously recognized arising from combinatorial usage of promoters, splice sites, and polyadenylation sites.
FlyBase (flybase.org) is a knowledge base that supports the community of researchers that use the fruit fly, Drosophila melanogaster, as a model organism. The FlyBase team curates and organizes a diverse array of genetic, molecular, genomic, and developmental information about Drosophila. At the beginning of 2018, ‘FlyBase 2.0’ was released with a significantly improved user interface and new tools. Among these important changes are a new organization of search results into interactive lists or tables (hitlists), enhanced reference lists, and new protein domain graphics. An important new data class called ‘experimental tools’ consolidates information on useful fly strains and other resources related to a specific gene, which significantly enhances the ability of the Drosophila researcher to design and carry out experiments. With the release of FlyBase 2.0, there has also been a restructuring of backend architecture and a continued development of application programming interfaces (APIs) for programmatic access to FlyBase data. In this review, we describe these major new features and functionalities of the FlyBase 2.0 site and how they support the use of Drosophila as a model organism for biological discovery and translational research.
Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or 'evolutionary signatures', dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre-and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies.The sequencing of the human genome and the genomes of dozens of other metazoan species has intensified the need for systematic methods to extract biological information directly from DNA sequence. Comparative genomics has emerged as a powerful methodology for this endeavour 1,2 . Comparison of few (two-four) closely related genomes has proven successful for the discovery of protein-coding genes 3-5 , RNA genes 6,7 , miRNA genes 8-11 and catalogues of regulatory elements 3,4,12-14 . The resolution and discovery power of these studies should increase with the number of genomes [15][16][17][18][19][20] , in principle enabling the systematic discovery of all conserved functional elements.The fruitfly Drosophila melanogaster is an ideal system for developing and evaluating comparative genomics methodologies. Over the past century, Drosophila has been a pioneering model in which many of the basic principles governing animal development and population biology were established 21 . In the past decade, the genome sequence of D. melanogaster provided one of the first systematic views *These authors contributed equally to this work. {Lists of participants and affiliations appear at the end of the paper.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.