We have sequenced and annotated the genome of ®ssion yeast (Schizosaccharomyces pombe), which contains the smallest number of protein-coding genes yet recorded for a eukaryote: 4,824. The centromeres are between 35 and 110 kilobases (kb) and contain related repeats including a highly conserved 1.8-kb element. Regions upstream of genes are longer than in budding yeast (Saccharomyces cerevisiae), possibly re¯ecting more-extended control regions. Some 43% of the genes contain introns, of which there are 4,730. Fifty genes have signi®cant similarity with human disease genes; half of these are cancer related. We identify highly conserved genes important for eukaryotic cell organization including those required for the cytoskeleton, compartmentation, cell-cycle control, proteolysis, protein phosphorylation and RNA splicing. These genes may have originated with the appearance of eukaryotic life. Few similarly conserved genes that are important for multicellular organization were identi®ed, suggesting that the transition from prokaryotes to eukaryotes required more new genes than did the transition from unicellular to multicellular organization.We report here the completion of the fully annotated genome sequence of the simple eukaryote Schizosaccharomyces pombe, a ®ssion yeast. It becomes the sixth eukaryotic genome to be sequenced, following Saccharomyces cerevisiae 1 , Caenorhabditis elegans 2 , Drosophila melanogaster 3 , Arabidopsis thaliana 4 and Homo sapiens 5,6 . The entire sequence of the unique regions of the three chromosomes is complete, with gaps in the centromeric regions of about 40 kb, and about 260 kb in the telomeric regions. The completion of this sequence, the availability of sophisticated research methodologies, and the expanding community working on S. pombe, will accelerate the use of S. pombe for functional and comparative studies of eukaryotic cell processes.
We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor–binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor–binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.
We explored transcriptional responses of the fission yeast Schizosaccharomyces pombe to various environmental stresses. DNA microarrays were used to characterize changes in expression profiles of all known and predicted genes in response to five stress conditions: oxidative stress caused by hydrogen peroxide, heavy metal stress caused by cadmium, heat shock caused by temperature increase to 39°C, osmotic stress caused by sorbitol, and DNA damage caused by the alkylating agent methylmethane sulfonate. We define a core environmental stress response (CESR) common to all, or most, stresses. There was a substantial overlap between CESR genes of fission yeast and the genes of budding yeast that are stereotypically regulated during stress. CESR genes were controlled primarily by the stress-activated mitogen-activated protein kinase Sty1p and the transcription factor Atf1p. S. pombe also activated gene expression programs more specialized for a given stress or a subset of stresses. In general, these "stress-specific" responses were less dependent on the Sty1p mitogen-activated protein kinase pathway and may involve specific regulatory factors. Promoter motifs associated with some of the groups of coregulated genes were identified. We compare and contrast global regulation of stress genes in fission and budding yeasts and discuss evolutionary implications.
Sexual reproduction requires meiosis to produce haploid gametes, which in turn can fuse to regenerate a diploid organism. We have studied the transcriptional program that drives this developmental process in Schizosaccharomyces pombe using DNA microarrays. Here we show that hundreds of genes are regulated in successive waves of transcription that correlate with major biological events of meiosis and sporulation. Each wave is associated with specific promoter motifs. Clusters of neighboring genes (mostly close to telomeres) are co-expressed early in the process, which reflects a more global control of these genes. We find that two Atf-like transcription factors are essential for the expression of late genes and formation of spores, and identify dozens of potential Atf target genes. Comparison with the meiotic program of the distantly related Saccharomyces cerevisiae reveals an unexpectedly small shared meiotic transcriptome, suggesting that the transcriptional regulation of meiosis evolved independently in both species.
FlyMine is a data warehouse that addresses one of the important challenges of modern biology: how to integrate and make use of the diversity and volume of current biological data. Its main focus is genomic and proteomics data for Drosophila and other insects. It provides web access to integrated data at a number of different levels, from simple browsing to construction of complex queries, which can be executed on either single items or lists. RationaleWith the completion of increasing numbers of genome sequences has come an explosion in the development of both computational and experimental techniques for deciphering the functions of genes, molecules and their interactions. These include theoretical methods for deducing function, such as analysis of protein homologies, structural domain predictions, phylogenetic profiling and analysis of protein domain fusions, as well as experimental techniques, such as microarray-based gene expression and transcription factor binding studies, two-hybrid protein-protein interaction screens, and large-scale RNA interference (RNAi) screens. The result is a huge amount of information and a current challenge is to extract meaningful knowledge and patterns of biological significance that can lead to new experimentally testable hypotheses. Many of these broad datasets, however, are noisy and the data quality can vary significantly. While in some circumstances the data from each of these techniques are useful in their own right, the ability to combine data from different sources facilitates interpretation and potentially allows stronger inferences to be made. Currently, biological data are stored in a wide variety of formats in numerous different places, making their combined analysis difficult: when information from several different databases is required, the assembly of data into a format suitable for querying is a challenge in itself. Sophisticated analysis of diverse data requires that they are available in a form that allows questions to be asked across them and that tools for constructing the questions are available. The development of systems for the integration and combined analysis of diverse data remains a priority in bioinformatics. Avoiding the need to understand and reformat many different data sources is a major benefit for end users of a centralized data access system.A number of studies have illustrated the power of integrating data for cross-validation, functional annotation and generating testable hypotheses (reviewed in [1,2]). These studies have covered a range of data types; some looking at the overlap between two different data sets, for example, protein interaction and expression data [3][4][5][6] Another key component is the use of ontologies that provide a standardized system for naming biological entities and their relationships and this aspect is based on the approach taken by the Chado schema [28]. For example, a large part of the FlyMine data model is based on the Sequence Ontology (a controlled-vocabulary for describing biological sequences) [29...
Summary: InterMine is an open-source data warehouse system that facilitates the building of databases with complex data integration requirements and a need for a fast customizable query facility. Using InterMine, large biological databases can be created from a range of heterogeneous data sources, and the extensible data model allows for easy integration of new data types. The analysis tools include a flexible query builder, genomic region search and a library of ‘widgets’ performing various statistical analyses. The results can be exported in many commonly used formats. InterMine is a fully extensible framework where developers can add new tools and functionality. Additionally, there is a comprehensive set of web services, for which client libraries are provided in five commonly used programming languages.Availability: Freely available from http://www.intermine.org under the LGPL license.Contact: g.micklem@gen.cam.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online.
Background: The genome of the fission yeast Schizosaccharomyces pombe has recently been sequenced, setting the stage for the post-genomic era of this increasingly popular model organism. We have built fission yeast microarrays, optimised protocols to improve array performance, and carried out experiments to assess various characteristics of microarrays.
In an effort to comprehensively characterize the functional elements within the genomes of the important model organisms Drosophila melanogaster and Caenorhabditis elegans, the NHGRI model organism Encyclopaedia of DNA Elements (modENCODE) consortium has generated an enormous library of genomic data along with detailed, structured information on all aspects of the experiments. The modMine database (http://intermine.modencode.org) described here has been built by the modENCODE Data Coordination Center to allow the broader research community to (i) search for and download data sets of interest among the thousands generated by modENCODE; (ii) access the data in an integrated form together with non-modENCODE data sets; and (iii) facilitate fine-grained analysis of the above data. The sophisticated search features are possible because of the collection of extensive experimental metadata by the consortium. Interfaces are provided to allow both biologists and bioinformaticians to exploit these rich modENCODE data sets now available via modMine.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.