To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.
Systematic annotation of gene regulatory elements is a major challenge in genome science. Direct mapping of chromatin modification marks and transcriptional factor binding sites genome-wide 1,2 has successfully identified specific subtypes of regulatory elements 3. In Drosophila several pioneering studies have provided genome-wide identification of Polycomb-Response Elements 4, chromatin states 5, transcription factor binding sites (TFBS) 6–9, PolII regulation 8, and insulator elements 10; however, comprehensive annotation of the regulatory genome remains a significant challenge. Here we describe results from the modENCODE cis-regulatory annotation project. We produced a map of the Drosophila melanogaster regulatory genome based on more than 300 chromatin immuno-precipitation (ChIP) datasets for eight chromatin features, five histone deacetylases (HDACs) and thirty-eight site-specific transcription factors (TFs) at different stages of development. Using these data we inferred more than 20,000 candidate regulatory elements and we validated a subset of predictions for promoters, enhancers, and insulators in vivo. We also identified nearly 2,000 genomic regions of dense TF binding associated with chromatin activity and accessibility. We discovered hundreds of new TF co-binding relationships and defined a TF network with over 800 potential regulatory relationships.
Insulators are DNA sequences that control the interactions among genomic regulatory elements and act as chromatin boundaries. A thorough understanding of their location and function is necessary to address the complexities of metazoan gene regulation. We studied by ChIP–chip the genome-wide binding sites of 6 insulator-associated proteins—dCTCF, CP190, BEAF-32, Su(Hw), Mod(mdg4), and GAF—to obtain the first comprehensive map of insulator elements in Drosophila embryos. We identify over 14,000 putative insulators, including all classically defined insulators. We find two major classes of insulators defined by dCTCF/CP190/BEAF-32 and Su(Hw), respectively. Distributional analyses of insulators revealed that particular sub-classes of insulator elements are excluded between cis-regulatory elements and their target promoters; divide differentially expressed, alternative, and divergent promoters; act as chromatin boundaries; are associated with chromosomal breakpoints among species; and are embedded within active chromatin domains. Together, these results provide a map demarcating the boundaries of gene regulatory units and a framework for understanding insulator function during the development and evolution of Drosophila.
FlyMine is a data warehouse that addresses one of the important challenges of modern biology: how to integrate and make use of the diversity and volume of current biological data. Its main focus is genomic and proteomics data for Drosophila and other insects. It provides web access to integrated data at a number of different levels, from simple browsing to construction of complex queries, which can be executed on either single items or lists. RationaleWith the completion of increasing numbers of genome sequences has come an explosion in the development of both computational and experimental techniques for deciphering the functions of genes, molecules and their interactions. These include theoretical methods for deducing function, such as analysis of protein homologies, structural domain predictions, phylogenetic profiling and analysis of protein domain fusions, as well as experimental techniques, such as microarray-based gene expression and transcription factor binding studies, two-hybrid protein-protein interaction screens, and large-scale RNA interference (RNAi) screens. The result is a huge amount of information and a current challenge is to extract meaningful knowledge and patterns of biological significance that can lead to new experimentally testable hypotheses. Many of these broad datasets, however, are noisy and the data quality can vary significantly. While in some circumstances the data from each of these techniques are useful in their own right, the ability to combine data from different sources facilitates interpretation and potentially allows stronger inferences to be made. Currently, biological data are stored in a wide variety of formats in numerous different places, making their combined analysis difficult: when information from several different databases is required, the assembly of data into a format suitable for querying is a challenge in itself. Sophisticated analysis of diverse data requires that they are available in a form that allows questions to be asked across them and that tools for constructing the questions are available. The development of systems for the integration and combined analysis of diverse data remains a priority in bioinformatics. Avoiding the need to understand and reformat many different data sources is a major benefit for end users of a centralized data access system.A number of studies have illustrated the power of integrating data for cross-validation, functional annotation and generating testable hypotheses (reviewed in [1,2]). These studies have covered a range of data types; some looking at the overlap between two different data sets, for example, protein interaction and expression data [3][4][5][6] Another key component is the use of ontologies that provide a standardized system for naming biological entities and their relationships and this aspect is based on the approach taken by the Chado schema [28]. For example, a large part of the FlyMine data model is based on the Sequence Ontology (a controlled-vocabulary for describing biological sequences) [29...
BackgroundThe mosquito, Anopheles gambiae, is the primary vector of human malaria, a disease responsible for millions of deaths each year. To improve strategies for controlling transmission of the causative parasite, Plasmodium falciparum, we require a thorough understanding of the developmental mechanisms, physiological processes and evolutionary pressures affecting life-history traits in the mosquito. Identifying genes expressed in particular tissues or involved in specific biological processes is an essential part of this process.ResultsIn this study, we present transcription profiles for ~82% of annotated Anopheles genes in dissected adult male and female tissues. The sensitivity afforded by examining dissected tissues found gene activity in an additional 20% of the genome that is undetected when using whole-animal samples. The somatic and reproductive tissues we examined each displayed patterns of sexually dimorphic and tissue-specific expression. By comparing expression profiles with Drosophila melanogaster we also assessed which genes are well conserved within the Diptera versus those that are more recently evolved.ConclusionsOur expression atlas and associated publicly available database, the MozAtlas (http://www.tissue-atlas.org), provides information on the relative strength and specificity of gene expression in several somatic and reproductive tissues, isolated from a single strain grown under uniform conditions. The data will serve as a reference for other mosquito researchers by providing a simple method for identifying where genes are expressed in the adult, however, in addition our resource will also provide insights into the evolutionary diversity associated with gene expression levels among species.
We describe a collection of P-element insertions that have considerable utility for generating custom chromosomal aberrations in Drosophila melanogaster. We have mobilized a pair of engineered P elements, p{RS3} and p{RS5}, to collect 3243 lines unambiguously mapped to the Drosophila genome sequence. The collection contains, on average, an element every 35 kb. We demonstrate the utility of the collection for generating custom chromosomal deletions that have their end points mapped, with base-pair resolution, to the genome sequence. The collection was generated in an isogenic strain, thus affording a uniform background for screens where sensitivity to genetic background is high. The entire collection, along with a computational and genetic toolbox for designing and generating custom deletions, is publicly available. Using the collection it is theoretically possible to generate Ͼ12,000 deletions between 1 bp and 1 Mb in size by simple eye color selection. In addition, a further 37,000 deletions, selectable by molecular screening, may be generated. We are now using the collection to generate a second-generation deficiency kit that is precisely mapped to the genome sequence. G ENETICALLY tractable model organisms are valufor components that function in particular pathways and characterize how individual genes participate in able research tools for uncovering basic biological such pathways. principles that are conserved through evolution. ManyThe fruit fly, Drosophila melanogaster, is one such tractamolecular pathways, such as signaling cascades, gene ble model that has been used extensively to elucidate regulatory pathways, and cell cycle control circuits, were many conserved genetic hierarchies. One particularly first characterized genetically in model systems. The powerful approach with Drosophila is the ability to rapsubsequent molecular cloning of the genes involved in idly carry out focused genome-wide screens for pathsuch pathways has shown how evolution has utilized way components by identifying loci that modify specific basic molecular building blocks to control a wide variety phenotypes (see St. Johnston 2002 for review). In this of biological processes. Key to the success of such apapproach, a sensitized genetic background, most comproaches has been the ability to carry out genetic screens monly exhibiting an easily scored adult phenotype such as rough eyes or a wing defect, is used to search for mutations in genes that make the phenotype more se- sensitized background and the phenotype is assessed. specific recombinase (FRT site) placed within intron one. In the case of RS3, a second FRT site is placed Importantly, the mutagenized chromosome is heterozygous, allowing genetic interactions between the sensiupstream of the first of the mini-white exons; in the case of RS5 the second FRT site is located downstream of tized background and mutations that are homozygous lethal to be detected. Particularly useful tools for such the mini-white exons. Golic and Golic demonstrated how a pair of RS3 and RS5 e...
We describe a second-generation deficiency kit for Drosophila melanogaster composed of molecularly mapped deletions on an isogenic background, covering 77% of the Release 5.1 genome. Using a previously reported collection of FRT-bearing P-element insertions, we have generated 655 new deletions and verified a set of 209 deletion-bearing fly stocks. In addition to deletions, we demonstrate how the P elements may also be used to generate a set of custom inversions and duplications, particularly useful for balancing difficult regions of the genome carrying haplo-insufficient loci. We describe a simple computational resource that facilitates selection of appropriate elements for generating custom deletions. Finally, we provide a computational resource that facilitates selection of other mapped FRT-bearing elements that, when combined with the DrosDel collection, can theoretically generate over half a million precisely mapped deletions.T HE availability of chromosomal deletion collections is of considerable benefit to the Drosophila research community for gene mapping, the phenotypic characterization of alleles, and genomewide genetic interaction screens. A core deficiency kit, composed of 270 genetically heterogeneous deletions covering 92% of the genome, has been built up over many years by the Bloomington Drosophila Stock Center (BDSC; http:/ / flystocks.bio.indiana.edu/Browse/df-dp/dfkit-info.htm). Continuing efforts by the Bloomington Center are currently focused on expanding genome coverage by recovering deletions in the vicinity of haplo-insufficient regions (K. Cook, personal communication). Despite the considerable utility of this collection, it does, by its very nature, suffer from a number of limitations. These include a heterogeneous genetic background, the presence of uncharacterized second-site mutations, and, for most deletions, molecularly undefined breakpoints. More recently, two groups have taken advantage of two key technologies: large collections of transposon insertions precisely mapped to the Drosophila genome sequence and site-specific recombination, to develop tools for producing custom chromosomal deletions in homogeneous genetic backgrounds that are mapped to the genome sequence with single-base-pair resolution (Parks et al. 2004;Ryder et al. 2004;Thibault et al. 2004).Sequence data from this article have been deposited with the EMBL/ GenBank data libraries under accession nos. AJ545047-AJ547612 and AJ622065-AJ622812. In both cases, the new deletion collections are generated using FLP-mediated recombination between pairs of transposon-borne FRT sites, a method originally developed in Drosophila by Golic and Golic (1996). In one case (Parks et al. 2004), a set of .29,000 P-element and piggyBac insertions (Thibault et al. 2004) were used to generate 519 deletions covering 56% of the euchromatic genome (the Exelixis collection). The high number of starting insertions used by this group allows fine-scale coverage of the genome with relatively small deletions; the average size of the exist...
Insulator or enhancer-blocking elements are proposed to play an important role in the regulation of transcription by preventing inappropriate enhancer/promoter interaction. The zinc-finger protein CTCF is well studied in vertebrates as an enhancer blocking factor, but Drosophila CTCF has only been characterised recently. To date only one endogenous binding location for CTCF has been identified in the Drosophila genome, the Fab-8 insulator in the Abdominal-B locus in the Bithorax complex (BX-C). We carried out chromatin immunopurification coupled with genomic microarray analysis to identify CTCF binding sites within representative regions of the Drosophila genome, including the 3-Mb Adh region, the BX-C, and the Antennapedia complex. Location of in vivo CTCF binding within these regions enabled us to construct a robust CTCF binding-site consensus sequence. CTCF binding sites identified in the BX-C map precisely to the known insulator elements Mcp, Fab-6, and Fab-8. Other CTCF binding sites correlate with boundaries of regulatory domains allowing us to locate three additional presumptive insulator elements; “Fab-2,” “Fab-3,” and “Fab-4.” With the exception of Fab-7, our data indicate that CTCF is directly associated with all known or predicted insulators in the BX-C, suggesting that the functioning of these insulators involves a common CTCF-dependent mechanism. Comparison of the locations of the CTCF sites with characterised Polycomb target sites and histone modification provides support for the domain model of BX-C regulation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.