A simple classification scheme that uses only the presence or absence of a protein domain architecture has been used to determine the phylogeny of 174 complete genomes. The method correctly divides the 174 taxa into Archaea, Bacteria, and Eukarya and satisfactorily sorts most of the major groups within these superkingdoms. The most challenging problem involved 119 Bacteria, many of which have reduced genomes. When a weighting factor was used that takes account of difference in genome size (number of considered folds), small-genome taxa were mostly grouped with their full-sized counterparts. Although not every organism appears exactly at its classical phylogenetic position in these trees, the agreement appears comparable with the efforts of others by using sophisticated sequence analysis and͞or combinations of gene content and gene order. During the course of the study, it emerged that there is a core set of Ϸ50 folds that is found in all 174 genomes and a single fold diagnostic of all Archaea.fold superfamily T he advent of the era of complete genome sequences has led to a variety of approaches for determining the evolutionary history of organisms over and beyond the comparison of the sequences themselves (1-4), including the use of such features as concatenated protein sequences (5, 6), gene content (1-3, 7), gene order (8-10), and the distribution of structural folds (11-15). Such efforts have continued even though there are those who feel the construction of a unified phylogeny is a hopeless task, horizontal gene transfers having been too pervasive to allow a singular depiction (16). In this vein, it is fair to say that the resulting phylogenies have not been entirely consistent between one method and another, and certainly none on its own has resulted in a wholly satisfactory classification. Attempts to filter out anomalies (17) or the use of combinations of various approaches (9, 10) have been more satisfactory, but incongruities remain.The principal goal of these endeavors is to generate a phylogeny that best represents the evolutionary histories of the taxa represented, and that resolves previous incongruities. It is generally agreed that three major forces are at work in modifying the genetic information in any genome: (i) expansion (gene duplication), (ii) deletion (gene loss), and (iii) exchange (horizontal transfer) (18)(19)(20)(21)(22). Additionally, there must be some degree of de novo ''gene genesis,'' the concoction of new genes by various means (23). The challenge is to find the level of informational bundling that best accounts for this combination of events.Here we report a simple scheme that uses a structural attribute, the protein domain content, as the principal determinant of relatedness. In particular, we have focused on the fold superfamily level (FSF) as opposed to the fold grouping itself that has been used by many other workers in the past (11-15). It is a subtle but critical distinction (14). The mere presence or absence of an FSF in a genome, as opposed to its overall abundance, was ...
WW domains are protein modules that bind proline-rich ligands. WW domain-ligand complexes are of importance as they have been implicated in several human diseases such as muscular dystrophy, cancer, hypertension, Alzheimer's, and Huntington's diseases. We report the results of a protein array aimed at mapping all the human WW domain protein-protein interactions. Our biochemical approach integrates parallel synthesis of peptides, protein expression, and high-throughput screening methodology combined with tools of bioinformatics. The results suggest that the majority of the bioinformatically predicted WW peptide ligands and most WW domains are functional, and that only about 10% of the measured domain-ligand interactions are positive. The analysis of the WW domain protein arrays also underscores the importance of the amino acid residues surrounding the WW ligand core motifs for specific binding to WW domains. In addition, the methodology presented here allows for the rapid elucidation of WW domain-ligand interactions with multiple applications including prediction of exact WW ligand binding sites, which can be applied to the mapping of other protein signaling domain families. Such information can be applied to the generation of protein interaction networks and identification of potential drug targets. To our knowledge, this report describes the first protein-protein interaction map of a domain in the human proteome.
Because of the rise in atmospheric oxygen 2.3 billion years ago (Gya) and the subsequent changes in oceanic redox state over the last 2.3-1 Gya, trace metal bioavailability in marine environments has changed dramatically. Although theorized to have influenced the biological usage of metals leaving discernable genomic signals, a thorough and quantitative test of this hypothesis has been lacking. Using structural bioinformatics and whole-genome sequences, the Fe-, Zn-, Mn-, and Co-binding metallomes of 23 Archaea, 233 Bacteria, and 57 Eukarya were constructed. These metallomes reveal that the overall abundances of these metalbinding structures scale to proteome size as power laws with a unique set of slopes for each Superkingdom of Life. The differences in the power describing the abundances of Fe-, Mn-, Zn-, and Co-binding proteins in the proteomes of Prokaryotes and Eukaryotes are similar to the theorized changes in the abundances of these metals after the oxygenation of oceanic deep waters. This phenomenon suggests that Prokarya and Eukarya evolved in anoxic and oxic environments, respectively, a hypothesis further supported by structures and functions of Fe-binding proteins in each Superkingdom. Also observed is a proliferation in the diversity of Zn-binding protein structures involved in protein-DNA and protein-protein interactions within Eukarya, an event unlikely to occur in either an anoxic or euxinic environment where Zn concentrations would be vanishingly low. We hypothesize that these conserved trends are proteomic imprints of changes in trace metal bioavailability in the ancient ocean that highlight a major evolutionary shift in biological trace metal usage.bioinorganic chemistry ͉ evolution ͉ fold families ͉ structural bioinformatics T he emergence of oxygenic photosynthesis is associated with major changes in global biogeochemistry and metabolism (1, 2). In particular, the rise in atmospheric oxygen Ϸ2.3 billion years ago (Gya) (3, 4) potentially led to the oxygenation of the entire ocean (5), whereas an alternative theory proposes that the deep ocean became euxinic (anoxic and sulfidic) Ϸ1.8 Gya (6, 7), before an oxygenation of deep waters Ϸ1 Gya (8). Putting aside for now when and where, these changes in the overall redox state of the ocean would dramatically influence trace metal chemistry and bioavailability, with an anoxic ocean being characterized by relatively high Fe, Mn, and Co but low Zn concentrations (9) (Fig. 4, which is published as supporting information on the PNAS web site). A euxinic ocean would have comparatively lower concentrations of all of these metals, particularly Zn (9) (Fig. 4). The oxygenation of oceanic deep waters would have dramatically increased Zn concentrations, with concomitant yet less severe decreases in Fe, Mn, and Co levels (9) (Fig. 4). As postulated by Williams and Frausto da Silva (10), these drastic shifts in metal bioavailability theoretically influenced the selection of trace elements for biological usage, leaving a record within the genomes and proteomes ...
BackgroundProtein structural domains are evolutionary units whose relationships can be detected over long evolutionary distances. The evolutionary history of protein domains, including the origin of protein domains, the identification of domain loss, transfer, duplication and combination with other domains to form new proteins, and the formation of the entire protein domain repertoire, are of great interest.Methodology/Principal FindingsA methodology is presented for providing a parsimonious domain history based on gain, loss, vertical and horizontal transfer derived from the complete genomic domain assignments of 1015 organisms across the tree of life. When mapped to species trees the evolutionary history of domains and domain combinations is revealed, and the general evolutionary trend of domain and combination is analyzed.Conclusions/SignificanceWe show that this approach provides a powerful tool to study how new proteins and functions emerged and to study such processes as horizontal gene transfer among more distant species.
The primary structure of a multifunctional protein, the large alpha-subunit of the Escherichia coli fatty acid oxidation complex, was determined by sequencing the fadB region of the fadBA operon. The amino-terminal sequence of this protein had been established by Edman degradation. The transcription start site of the fadBA operon was located 42 nucleotides upstream of the initiator codon of the fadB gene by primer extension analysis. Sequences of -10 and -35 regions of the promoter responsible for interaction with RNA polymerase were found to be CACACT and TTTGCA, respectively. The location of the promoter of the fadBA operon was defined, and the transcription direction of this operon, from fadB to fadA, as previously proposed [Yang, S.-Y., et al. (1990) J. Biol. Chem. 265, 10424-10429], was corroborated. The multifunctional protein is composed of 729 amino acid residues and has a calculated Mr of 79,593. A putative NAD-binding beta alpha beta-fold necessary for L-3-hydroxyacyl-CoA dehydrogenase function was found in the central region of the fadB gene product. Sequence analyses suggest that the functional domains of the multifunctional protein are arranged in the order enoyl-CoA hydratase:L-3-hydroxyacyl-CoA dehydrogenase: delta 3-cis-delta 2-trans-enoyl-CoA isomerase and suggest that the genes of the E. coli multifunctional protein and rat peroxisomal trifunctional beta-oxidation enzyme evolved from a common ancestral gene.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.