The higher plant Arabidopsis thaliana (Arabidopsis) is an important model for identifying plant genes and determining their function. To assist biological investigations and to define chromosome structure, a coordinated effort to sequence the Arabidopsis genome was initiated in late 1996. Here we report one of the first milestones of this project, the sequence of chromosome 4. Analysis of 17.38 megabases of unique sequence, representing about 17% of the genome, reveals 3,744 protein coding genes, 81 transfer RNAs and numerous repeat elements. Heterochromatic regions surrounding the putative centromere, which has not yet been completely sequenced, are characterized by an increased frequency of a variety of repeats, new repeats, reduced recombination, lowered gene density and lowered gene expression. Roughly 60% of the predicted protein-coding genes have been functionally characterized on the basis of their homology to known genes. Many genes encode predicted proteins that are homologous to human and Caenorhabditis elegans proteins.
The MITOP database http://websvr.mips.biochem.mpg. de/proj/medgen/mitop/ consolidates information on both nuclear- and mitochondrial-encoded genes and their proteins. The five species files- Saccharomyces cerevisiae, Mus musculus, Caenorhabditis elegans, Neurospora crassa and Homo sapiens -include annotated data derived from a variety of online resources and the literature. A wide spectrum of search facilities is given in the interelated sections 'Gene catalogues', 'Protein catalogues', 'Homologies', 'Pathways and metabolism', and 'Human disease catalogue' including extensive references and hyperlinks for each entry. Precomputed FASTA searches using all the MITOP yeast protein entries and a list of the best EST hits with graphical cluster alignments related to the yeast reference sequence are presented. The MITOP orthologue tables with cross-listing to all the protein entries for each species in the database facilitate investigations into interspecies homology. A program (MITOPROT) is available to identify mitochondrial targeting sequences and graphical depictions of several important mitochondrial processes are included. The 'Human disease catalogue' lists a total of 101 disorders related to mitochondrial protein abnormalities, sorted by clinical criteria and age of onset.
MITOP (http://www.mips.biochem.mpg.de/proj/medgen/mitop/) is a comprehensive database for genetic and functional information on both nuclear- and mitochondrial-encoded proteins and their genes. The five species files--Saccharomyces cerevisiae, Mus musculus, Caenorhabditis elegans, Neurospora crassa and Homo sapiens--include annotated data derived from a variety of online resources and the literature. A wide spectrum of search facilities is given in the overlapping sections 'Gene catalogues', 'Protein catalogues', 'Homologies', 'Pathways and metabolism' and 'Human disease catalogue' including extensive references and hyperlinks to other databases. Central features are the results of various homology searches, which should facilitate the investigations into interspecies relationships. Precomputed FASTA searches using all the MITOP yeast protein entries and a list of the best human EST hits with graphical cluster alignments related to the yeast reference sequence are presented. The orthologue tables with cross-listings to all the protein entries for each species in MITOP have been expanded by adding the genomes of Rickettsia prowazeckii and Escherichia coli. To find new mitochondrial proteins the complete yeast genome has been analyzed using the MITOPROT program which identifies mitochondrial targeting sequences. The 'Human disease catalogue' contains tables with a total of 110 human diseases related to mitochondrial protein abnormalities, sorted by clinical criteria and age of onset. MITOP should contribute to the systematic genetic characterization of the mitochondrial proteome in relation to human disease.
Arabidopsis thaliana is an important model system for plant biologists. In 1996 an international collaboration (the Arabidopsis Genome Initiative) was formed to sequence the whole genome of Arabidopsis and in 1999 the sequence of the first two chromosomes was reported. The sequence of the last three chromosomes and an analysis of the whole genome are reported in this issue. Here we present the sequence of chromosome 3, organized into four sequence segments (contigs). The two largest (13.5 and 9.2 Mb) correspond to the top (long) and the bottom (short) arms of chromosome 3, and the two small contigs are located in the genetically defined centromere. This chromosome encodes 5,220 of the roughly 25,500 predicted protein-coding genes in the genome. About 20% of the predicted proteins have significant homology to proteins in eukaryotic genomes for which the complete sequence is available, pointing to important conserved cellular functions among eukaryotes.
Arabidopsis thaliana is the first plant for which the complete genome has been sequenced and published. Annotation of complex eukaryotic genomes requires more than the assignment of genetic elements to the sequence. Besides completing the list of genes, we need to discover their cellular roles, their regulation and their interactions in order to understand the workings of the whole plant. The MIPS Arabidopsis thaliana Database (MAtDB; http://mips.gsf.de/proj/thal/db) started out as a repository for genome sequence data in the European Scientists Sequencing Arabidopsis (ESSA) project and the Arabidopsis Genome Initiative. Our aim is to transform MAtDB into an integrated biological knowledge resource by integrating diverse data, tools, query and visualization capabilities and by creating a comprehensive resource for Arabidopsis as a reference model for other species, including crop plants.
We report the sequence of a 7941 bp DNA fragment from the left arm of chromosome VII of Saccharomyces cerevisiae which contains four open reading frames (ORFs) of greater than 100 amino acid residues. ORF biC834 shows 100% bp identity with the recently identified multicopy suppressor gene of the pop2 mutation (MPT5); its deduced protein product carries an eight-repeat domain region, homologous to that found in the hypothetical regulatory YGL023 protein of S. cerevisiae and the Pumilio protein of Drosophila. ORF biE560 protein exhibits patterns typical of serine/threonine protein kinases, with which it shares high degrees of homology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.