Evolution of complexity in eukaryotic proteomes has arisen, in part, through emergence of modular independently folded domains mediating protein interactions via binding to short linear peptides in proteins. Over 30 years, structural properties and sequence preferences of these peptides have been extensively characterized. Less successful, however, were efforts to establish relationships between physicochemical properties and functions of domainpeptide interactions. To our knowledge, we have devised the first strategy to exhaustively explore the binding specificity of protein domain-peptide interactions. We applied the strategy to SH3 domains to determine the properties of their binding peptides starting from various experimental data. The strategy identified the majority (∼70%) of experimentally determined SH3 binding sites. We discovered mutual relationships among binding specificity, binding affinity, and structural properties and evolution of linear peptides. Remarkably, we found that these properties are also related to functional diversity, defined by depth of proteins within hierarchies of gene ontologies. Our results revealed that linear peptides evolved to coadapt specificity and affinity to functional diversity of domain-peptide interactions. Thus, domain-peptide interactions follow human-constructed gene ontologies, which suggest that our understanding of biological process hierarchies reflect the way chemical and thermodynamic properties of linear peptides and their interaction networks, in general, have evolved.linear peptides | domain-peptide interactions | binding specificity | binding affinity | functional specificity M any proteins, particularly in eukaryotes, are composed of modular protein architectures consisting of multiple independently folding domains (1). Specific domains such as SH3 and PDZ domains were repeatedly used throughout evolution in increasingly complex organisms to mediate protein-protein interactions involved in signal transduction and protein targeting (2-5). These domains are associated with a number of human diseases and are targets of virus and other pathogen virulence proteins (6). Functions of these domains include binding to sequence-specific peptides both among themselves and on other proteins. Such interactions can create enormous plasticity in complex signaling and regulatory networks on immediate to evolutionary timescales (7), and are often used for regulating the activities of proteins and the spatiotemporal organization of protein interaction networks (8,9). However, at the cellular level, we still do not grasp why certain peptides in proteins bind to distinct domains with high specificity whereas others highly cross-react with a number of members of a family of domains, and also what is the relationship between specificity of binding and specificity of functions of domain-peptide interactions. Two extreme examples are peptides of the MAPKK protein Pbs2 (residues 92-106) (10) and the actin assembly protein Las17 (residues 306-336) (11), which both interact with ...
Background: The rapid burgeoning of available protein data makes the use of clustering within families of proteins increasingly important. The challenge is to identify subfamilies of evolutionarily related sequences. This identification reveals phylogenetic relationships, which provide prior knowledge to help researchers understand biological phenomena. A good evolutionary model is essential to achieve a clustering that reflects the biological reality, and an accurate estimate of protein sequence similarity is crucial to the building of such a model. Most existing algorithms estimate this similarity using techniques that are not necessarily biologically plausible, especially for hard-to-align sequences such as proteins with different domain structures, which cause many difficulties for the alignment-dependent algorithms. In this paper, we propose a novel similarity measure based on matching amino acid subsequences. This measure, named SMS for Substitution Matching Similarity, is especially designed for application to non-aligned protein sequences. It allows us to develop a new alignment-free algorithm, named CLUSS, for clustering protein families. To the best of our knowledge, this is the first alignment-free algorithm for clustering protein sequences. Unlike other clustering algorithms, CLUSS is effective on both alignable and non-alignable protein families. In the rest of the paper, we use the term "phylogenetic" in the sense of "relatedness of biological functions".
High-throughput in vitro methods have been extensively applied to identify linear information that encodes peptide recognition. However, these methods are limited in number of peptides, sequence variation, and length of peptides that can be explored, and often produce solutions that are not found in the cell. Despite the large number of methods developed to attempt addressing these issues, the exhaustive search of linear information encoding protein-peptide recognition has been so far physically unfeasible. Here, we describe a strategy, called DALEL, for the exhaustive search of linear sequence information encoded in proteins that bind to a common partner. We applied DALEL to explore binding specificity of SH3 domains in the budding yeast Saccharomyces cerevisiae. Using only the polypeptide sequences of SH3 domain binding proteins, we succeeded in identifying the majority of known SH3 binding sites previously discovered either in vitro or in vivo. Moreover, we discovered a number of sites with both non-canonical sequences and distinct properties that may serve ancillary roles in peptide recognition. We compared DALEL to a variety of state-of-the-art algorithms in the blind identification of known binding sites of the human Grb2 SH3 domain. We also benchmarked DALEL on curated biological motifs derived from the ELM database to evaluate the effect of increasing/decreasing the enrichment of the motifs. Our strategy can be applied in conjunction with experimental data of proteins interacting with a common partner to identify binding sites among them. Yet, our strategy can also be applied to any group of proteins of interest to identify enriched linear motifs or to exhaustively explore the space of linear information encoded in a polypeptide sequence. Finally, we have developed a webserver located at http://michnick.bcm.umontreal.ca/dalel, offering user-friendly interface and providing different scenarios utilizing DALEL.
Many proteins involved in signal transduction contain peptide recognition modules (PRMs) that recognize short linear motifs (SLiMs) within their interaction partners. Here, we used large‐scale peptide‐phage display methods to derive optimal ligands for 163 unique PRMs representing 79 distinct structural families. We combined the new data with previous data that we collected for the large SH3, PDZ, and WW domain families to assemble a database containing 7,984 unique peptide ligands for 500 PRMs representing 82 structural families. For 74 PRMs, we acquired enough new data to map the specificity profiles in detail and derived position weight matrices and binding specificity logos based on multiple peptide ligands. These analyses showed that optimal peptide ligands resembled peptides observed in existing structures of PRM‐ligand complexes, indicating that a large majority of the phage‐derived peptides are likely to target natural peptide‐binding sites and could thus act as inhibitors of natural protein–protein interactions. The complete dataset has been assembled in an online database (http://www.prm-db.org) that will enable many structural, functional, and biological studies of PRMs and SLiMs.
Integrins are transmembrane multi-conformation receptors that mediate interactions with the extracellular matrix. In cancer, integrins influence metastasis, proliferation, and survival. Collagen-binding integrin-α11/β1, a marker of aggressive tumors that is involved in stroma-tumor crosstalk, may be an attractive target for anti-cancer therapeutic antibodies. We performed selections with phage-displayed synthetic antibody libraries for binding to either purified integrin-α11/β1 or in situ on live cells. The insitu strategy yielded many diverse antibodies, and strikingly, most of these antibodies did not recognize purified integrin-α11/β1. Conversely, none of the antibodies selected for binding to purified integrin-α11/β1 were able to efficiently recognize native cell-surface antigen. Most importantly, only the in-situ selection yielded functional antibodies that were able to compete with collagen-I for binding to cellsurface integrin-α11/β1, and thus inhibited cell adhesion. In-depth characterization of a subset of in situderived clones as full-length immunoglobulins revealed high affinity cellular binding and inhibitory activities in the single-digit nanomolar range. Moreover, the antibodies showed high selectivity for integrin-α11/β1 with minimal cross-reactivity for close homologs. Taken together, our findings highlight the advantages of in-situ selections for generation of anti-integrin antibodies optimized for recognition and inhibition of native cell-surface proteins, and our work establishes general methods that could be extended to many other membrane proteins.
CLUSS is an algorithm proposed for clustering both alignable and non-alignable protein sequences. However, CLUSS tends to be ineffective on protein datasets that include a large number of biochemical activities. To overcome this difficulty, we propose in this paper a new algorithm, named CLUSS2 that scales better with the increase of the number of biochemical activities. CLUSS2 differs from CLUSS in many ways including protein sequences representation, conserved motifs extraction and time efficiency. Our experiments show that CLUSS2 more accurately highlights the functional characteristics of the clustered families, especially for those with a large number of biochemical activities.
Linear motifs mediate a wide variety of cellular functions, which makes their characterization in protein sequences crucial to understanding cellular systems. However, the short length and degenerate nature of linear motifs make their discovery a difficult problem. Here, we introduce MotifHound, an algorithm particularly suited for the discovery of small and degenerate linear motifs. MotifHound performs an exact and exhaustive enumeration of all motifs present in proteins of interest, including all of their degenerate forms, and scores the overrepresentation of each motif based on its occurrence in proteins of interest relative to a background (e.g., proteome) using the hypergeometric distribution. To assess MotifHound, we benchmarked it together with state-of-the-art algorithms. The benchmark consists of 11,880 sets of proteins from S. cerevisiae; in each set, we artificially spiked-in one motif varying in terms of three key parameters, (i) number of occurrences, (ii) length and (iii) the number of degenerate or “wildcard” positions. The benchmark enabled the evaluation of the impact of these three properties on the performance of the different algorithms. The results showed that MotifHound and SLiMFinder were the most accurate in detecting degenerate linear motifs. Interestingly, MotifHound was 15 to 20 times faster at comparable accuracy and performed best in the discovery of highly degenerate motifs. We complemented the benchmark by an analysis of proteins experimentally shown to bind the FUS1 SH3 domain from S. cerevisiae. Using the full-length protein partners as sole information, MotifHound recapitulated most experimentally determined motifs binding to the FUS1 SH3 domain. Moreover, these motifs exhibited properties typical of SH3 binding peptides, e.g., high intrinsic disorder and evolutionary conservation, despite the fact that none of these properties were used as prior information. MotifHound is available (http://michnick.bcm.umontreal.ca or http://tinyurl.com/motifhound) together with the benchmark that can be used as a reference to assess future developments in motif discovery.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.