A computational method is proposed for inferring protein interactions from genome sequences on the basis of the observation that some pairs of interacting proteins have homologs in another organism fused into a single protein chain. Searching sequences from many genomes revealed 6809 such putative proteinprotein interactions in Escherichia coli and 45,502 in yeast. Many members of these pairs were confirmed as functionally related; computational filtering further enriches for interactions. Some proteins have links to several other proteins; these coupled links appear to represent functional interactions such as complexes or pathways. Experimentally confirmed interacting pairs are documented in a Database of Interacting Proteins.The lives of biological cells are controlled by interacting proteins in metabolic and signaling pathways and in complexes such as the molecular machines that synthesize and use adenosine triphosphate (ATP), replicate and translate genes, or build up the cytoskeletal infrastructure (1). Our knowledge of proteinprotein interactions has been accumulated from biochemical and genetic experiments, including the widely used yeast two-hybrid test (2). Here we ask if protein-protein interactions can be recognized from genome sequences by purely computational means.Some interacting proteins such as the Gyr A and Gyr B subunits of Escherichia coli DNA gyrase are fused into a single chain in another organism, in this case the topoisomerase II of yeast (3). Thus, the sequence similarities of Gyr A (804 amino acid residues) and Gyr B (875 residues) to different segments of the topoisomerase II (1429 residues) might be used to predict that Gyr A and Gyr B interact in E. coli.To find other such putative protein interactions in E. coli, we searched the 4290 protein sequences of the E. coli genome (4) for these patterns of sequence homology (5). We found 6809 pairs of nonhomologous sequences, both members of the pair having significant similarity (6) to a single protein in some other genome that we term a Rosetta Stone sequence because it deciphers the interaction between the protein pairs. The 4290 proteins could form at most (4290) 2 /2 ϭ 9 ϫ 10 6 pair interactions, but we would expect many fewer interactions in a functioning cell; roughly 2 to 10 interactions for each protein does not seem unreasonably many. Each of these 6809 pairs is a candidate for a pair of interacting proteins in E. coli. Five such candidates are shown in Fig. 1. The first three pairs of E. coli proteins were among those easily determined from the biochemical literature in fact to interact. The final two pairs of proteins are not known to interact. They are representatives of many such pairs whose putative interactions at this time must be taken as testable hypotheses.We devised three independent tests of interactions predicted by the method we term domain fusion analysis, each showing that a reasonable fraction may in fact interact. The first method uses the annotation of proteins given in the SWISS-PROT database (7). For cases wh...
The mitochondrial genomes of seed plants are unusually large and vary in size by at least an order of magnitude. Much of this variation occurs within a single family, the Cucurbitaceae, whose genomes range from an estimated 390 to 2,900 kb in size. We sequenced the mitochondrial genomes of Citrullus lanatus (watermelon: 379,236 nt) and Cucurbita pepo (zucchini: 982,833 nt)--the two smallest characterized cucurbit mitochondrial genomes--and determined their RNA editing content. The relatively compact Citrullus mitochondrial genome actually contains more and longer genes and introns, longer segmental duplications, and more discernibly nuclear-derived DNA. The large size of the Cucurbita mitochondrial genome reflects the accumulation of unprecedented amounts of both chloroplast sequences (>113 kb) and short repeated sequences (>370 kb). A low mutation rate has been hypothesized to underlie increases in both genome size and RNA editing frequency in plant mitochondria. However, despite its much larger genome, Cucurbita has a significantly higher synonymous substitution rate (and presumably mutation rate) than Citrullus but comparable levels of RNA editing. The evolution of mutation rate, genome size, and RNA editing are apparently decoupled in Cucurbitaceae, reflecting either simple stochastic variation or governance by different factors.
Abstract:We report the complete mitochondrial genome sequence of the flowering plant Amborella trichopoda. This enormous, 3.9 Mb genome contains six genome equivalents of foreign mitochondrial DNA, acquired from green algae, mosses, and other angiosperms. Many of these horizontal transfers were large, including acquisition of entire mitochondrial genomes from three green algae and one moss. We propose a fusion-compatibility model to explain these findings, with Amborella capturing whole mitochondria from diverse eukaryotes, followed by mitochondrial fusion (limited mechanistically to green plant mitochondria), and then genome recombination. Amborella's epiphyte load, propensity to produce suckers from wounds, and low rate of mitochondrial DNA loss probably all contribute to the high level of foreign DNA in its mitochondrial genome.
Point mutations result from errors made during DNA replication or repair, so they are usually expected to be homogeneous across all regions of a genome. However, we have found a region of chloroplast DNA in plants related to sweetpea (Lathyrus) whose local point mutation rate is at least 20 times higher than elsewhere in the same molecule. There are very few precedents for such heterogeneity in any genome, and we suspect that the hypermutable region may be subject to an unusual process such as repeated DNA breakage and repair. The region is 1.5 kb long and coincides with a gene, ycf4, whose rate of evolution has increased dramatically. The product of ycf4, a photosystem I assembly protein, is more divergent within the single genus Lathyrus than between cyanobacteria and other angiosperms. Moreover, ycf4 has been lost from the chloroplast genome in Lathyrus odoratus and separately in three other groups of legumes. Each of the four consecutive genes ycf4-psaI-accD-rps16 has been lost in at least one member of the legume ''inverted repeat loss'' clade, despite the rarity of chloroplast gene losses in angiosperms. We established that accD has relocated to the nucleus in Trifolium species, but were unable to find nuclear copies of ycf4 or psaI in Lathyrus. Our results suggest that, as well as accelerating sequence evolution, localized hypermutation has contributed to the phenomenon of gene loss or relocation to the nucleus.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.