The release of the 1000th complete microbial genome will occur in the next two to three years. In anticipation of this milestone, the Fellowship for Interpretation of Genomes (FIG) launched the Project to Annotate 1000 Genomes. The project is built around the principle that the key to improved accuracy in high-throughput annotation technology is to have experts annotate single subsystems over the complete collection of genomes, rather than having an annotation expert attempt to annotate all of the genes in a single genome. Using the subsystems approach, all of the genes implementing the subsystem are analyzed by an expert in that subsystem. An annotation environment was created where populated subsystems are curated and projected to new genomes. A portable notion of a populated subsystem was defined, and tools developed for exchanging and curating these objects. Tools were also developed to resolve conflicts between populated subsystems. The SEED is the first annotation environment that supports this model of annotation. Here, we describe the subsystem approach, and offer the first release of our growing library of populated subsystems. The initial release of data includes 180 177 distinct proteins with 2133 distinct functional roles. This data comes from 173 subsystems and 383 different organisms.
Previously, we presented evidence that it is possible to predict functional coupling between genes based on conservation of gene clusters between genomes. With the rapid increase in the availability of prokaryotic sequence data, it has become possible to verify and apply the technique. In this paper, we extend our characterization of the parameters that determine the utility of the approach, and we generalize the approach in a way that supports detection of common classes of functionally coupled genes (e.g., transport and signal transduction clusters). Now that the analysis includes over 30 complete or nearly complete genomes, it has become clear that this approach will play a significant role in supporting efforts to assign functionality to the remaining uncharacterized genes in sequenced genomes.Gene clusters are known to be prominent features of bacterial chromosomes. Demerec and Hartman (1) postulated in 1959 that ''regardless of how the gene clusters originated, natural selection must act to prevent their separation'' and the ''mere existence of such arrangements shows that they must be beneficial, conferring an evolutionary advantage on individuals and populations which exhibit them.'' One of the most striking features of prokaryotic gene clusters is that typically they are composed of functionally related genes. For the past 40 years, there has been vigorous, ongoing discussion on the functional significance of gene arrangement on the chromosome, as well as the origin and mechanisms of maintenance of gene clusters (see, for example, refs. 2-5).Here, we present a method that uses conserved gene clusters from a large number of genomes to predict functional coupling between genes in those genomes. This article further develops the approach that we previously reported (6) and uses this method to reconstruct several major metabolic and functional subsystems. MethodologyThe data presented below are computed via the WIT system (http:͞͞wit.mcs.anl.gov͞WIT2͞), developed by Overbeek et al. (7) at Argonne National Laboratory. WIT was designed and implemented to support genetic sequence analysis, metabolic reconstructions, and comparative analysis of sequenced genomes; it currently contains data from over 30 genomes, albeit a few of them are incomplete.Our approach to detection of conserved clusters of genes is based on the following definitions: a set of genes occurring on a prokaryotic chromosome will be called a ''run'' if and only if they all occur on the same strand and the gaps between adjacent genes are 300 bp or less. Any pair of genes occurring within a single run is called ''close.'' Given two genes X a and X b from two genomes G a and G b , X a and X b are called a ''bidirectional best hit (BBH)'' if and only if recognizable similarity exists between them (in our case, we required FASTA3 scores lower than 1.0 ϫ 10 Ϫ5 ), there is no gene Z b in G b that is more similar than X b is to X a , and there is no gene Z a in G a that is more similar than X a is to Computation of PCBBHs for 31 complete or ne...
Defining the gene products that play an essential role in an organism's functional repertoire is vital to understanding the system level organization of living cells. We used a genetic footprinting technique for a genome-wide assessment of genes required for robust aerobic growth of Escherichia coli in rich media. We identified 620 genes as essential and 3,126 genes as dispensable for growth under these conditions. Functional context analysis of these data allows individual functional assignments to be refined. Evolutionary context analysis demonstrates a significant tendency of essential E. coli genes to be preserved throughout the bacterial kingdom. Projection of these data over metabolic subsystems reveals topologic modules with essential and evolutionarily preserved enzymes with reduced capacity for error tolerance.Sequencing and comparative analysis of multiple diverse genomes is revolutionizing contemporary biology by providing a framework for interpreting and predicting the physiologic properties of an organism. A variety of emerging postgenomic techniques such as genome-wide expression profiling and monitoring of macromolecular complex formation can reveal the detailed molecular compositions of cells. New computational approaches to exploring the inherent organization of cellular networks, the mode and dynamics of interactions among cellular constituents, are in early stages of development (14,22,23). These techniques allow us to begin unraveling a major paradigm of cellular biology: how biological properties arise from the large number of components making up an individual cell.
The lactic acid bacterium Streptococcus thermophilus is widely used for the manufacture of yogurt and cheese. This dairy species of major economic importance is phylogenetically close to pathogenic streptococci, raising the possibility that it has a potential for virulence. Here we report the genome sequences of two yogurt strains of S. thermophilus . We found a striking level of gene decay (10% pseudogenes) in both microorganisms. Many genes involved in carbon utilization are nonfunctional, in line with the paucity of carbon sources in milk. Notably, most streptococcal virulence-related genes that are not involved in basic cellular processes are either inactivated or absent in the dairy streptococcus. Adaptation to the constant milk environment appears to have resulted in the stabilization of the genome structure. We conclude that S. thermophilus has evolved mainly through loss-of-function events that remarkably mirror the environment of the dairy niche resulting in a severely diminished pathogenic potential. Supplementary information The online version of this article (doi:10.1038/nbt1034) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.