Many regulatory proteins bind peptide regions of target proteins and modulate their activity. Such regulatory proteins can often interact with highly diverse target peptides. In many instances, it is not known if the peptide-binding interface discriminates targets in a biological context, or whether biological specificity is achieved exclusively through external factors such as subcellular localization. We used an evolutionary biochemical approach to distinguish these possibilities for two such low-specificity proteins: S100A5 and S100A6. We used isothermal titration calorimetry to study the binding of peptides with diverse sequence and biochemistry to human S100A5 and S100A6. These proteins bound distinct, but overlapping, sets of peptide targets. We then studied the peptide binding properties of orthologs sampled from across five amniote species. Binding specificity was conserved along all lineages, for the last 320 million years, despite the low specificity of each protein. We used ancestral sequence reconstruction to determine the binding specificity of the last common ancestor of the paralogs. The ancestor bound the entire set of peptides bound by modern S100A5 and S100A6 proteins, suggesting that paralog specificity evolved via subfunctionalization. To rule out the possibility that specificity is conserved because it is difficult to modify, we identified a single historical mutation that, when reverted in human S100A5, gave it the ability to bind an S100A6-specific peptide. These results reveal strong evolutionary constraints on peptide binding specificity. Despite being able to bind a large number of targets, the specificity of S100 peptide interfaces is likely important for the biology of these proteins.
Many proteins interact with short linear regions of target proteins. For some proteins, however, it is difficult to identify a well-defined sequence motif that defines its target peptides. To overcome this difficulty, we used supervised machine learning to train a model that treats each peptide as a collection of easily-calculated biochemical features rather than as an amino acid sequence. As a test case, we dissected the peptide-recognition rules for human S100A5 (hA5), a low-specificity calcium binding protein. We trained a Random Forest model against a recently released, high-throughput phage display dataset collected for hA5. The model identifies hydrophobicity and shape complementarity, rather than polar contacts, as the primary determinants of peptide binding specificity in hA5. We tested this hypothesis by solving a crystal structure of hA5 and through computational docking studies of diverse peptides onto hA5. These structural studies revealed that peptides exhibit multiple binding modes at the hA5 peptide interface-all of which have few polar contacts with hA5. Finally, we used our trained model to predict new, plausible binding targets in the human proteome. This revealed a fragment of the protein α-1-syntrophin that binds to hA5. Our work helps better understand the biochemistry and biology of hA5, as well as demonstrating how high-throughput experiments coupled with machine learning of biochemical features can reveal the determinants of binding specificity in low-specificity proteins.
9Many proteins interact with short linear regions of target proteins. For some proteins, 10 however, it is difficult to identify a well-defined sequence motif that defines its target peptides. 11To overcome this difficulty, we used supervised machine learning to train a model that treats 12 each peptide as a collection of easily-calculated biochemical features rather than as an amino 13 acid sequence. As a test case, we dissected the peptide-recognition rules for human S100A5 14 (hA5), a low-specificity calcium binding protein. We trained a Random Forest model against 15 a recently released, high-throughput phage display dataset collected for hA5. The model 16 identifies hydrophobicity and shape complementarity, rather than polar contacts, as the 17 primary determinants of peptide binding specificity in hA5. We tested this hypothesis by 18 solving a crystal structure of hA5 and through computational docking studies of diverse 19 peptides onto hA5. These structural studies revealed that peptides exhibit multiple binding 20 modes at the hA5 peptide interface-all of which have few polar contacts with hA5. Finally, 21 we used our trained model to predict new, plausible binding targets in the human proteome. 22 This revealed a fragment of the protein α-1-syntrophin binds to hA5. Our work helps 23 better understand the biochemistry and biology of hA5, as well as demonstrating how high- 24 throughput experiments coupled with machine learning of biochemical features can reveal 25 the determinants of binding specificity in low-specificity proteins. 26 Keywords 27 S100 proteins, machine learning, X-ray crystallography, binding specificity, peptides, hy-28 drophobicity 29 Up to 40% of protein-protein interactions are mediated by globular domains that recognize 31 a short, linear fragment of their interaction partner. 1,2 Such protein-peptide interactions 32 play key roles in processes ranging from from signaling networks to biological phase tran-33 sitions. 2,3 Understanding such systems therefore requires knowing which proteins recognize 34 which peptides under what conditions. 2,4 35 Protein-peptide interaction interfaces exhibit a wide range of specificity. For some pro-36 teins, one can describe specificity using a simple binding motif that encodes the amino acid(s) 37 recognized at each site in the peptide. 5,6 One can predict protein targets by searching for 38 matching sequences within the proteome. 6 Some proteins deviate from this highly specific 39 paradigm, requiring more sophisticated approaches. For example, many PDZ binding do-40 mains exhibit binding "multi-specificity", in which peptide preference must be represented as 41 a handful binding motifs. 7,8 Predicting interaction targets for such proteins is more difficult 42 than for proteins with single binding motifs, but the same basic logic applies: search the 43 proteome for sequences that match the binding motifs. 44Even more extreme cases exist, such as S100 proteins. Members of this family of calcium-45 activated signaling prot...
8 S100 proteins bind linear peptide regions of target proteins and modulate their ac-9 tivity. The peptide binding interface, however, has remarkably low specificity and 10 can interact with many target peptides. It is not clear if the interface discrimi-11 nates targets in a biological context, or whether biological specificity is achieved 12 exclusively through external factors such as subcellular localization. To discriminate 13 these possibilities, we used an evolutionary biochemical approach to trace the evolu-14 tion of paralogs S100A5 and S100A6. We first used isothermal titration calorimetry 15 to study the binding of a collection of peptides with diverse sequence, hydrophobic-16 ity, and charge to human S100A5 and S100A6. These proteins bound distinct, but 17 overlapping, sets of peptide targets. We then studied the peptide binding properties 18 of S100A5 and S100A6 orthologs sampled from across five representative amniote 19 species. We found that the pattern of binding specificity was conserved along all 20 lineages, for the last 320 million years, despite the low specificity of each protein. 21We next used Ancestral Sequence Reconstruction to determine the binding speci-22 ficity of the last common ancestor of the paralogs. We found the ancestor bound 23 the whole set of peptides bound by modern S100A5 and S100A6 proteins, suggesting 24 that paralog specificity evolved by subfunctionalization. To rule out the possibility 25 that specificity is conserved because it is difficult to modify, we identified a single 26 historical mutation that, when reverted in human S100A5, gave it the ability to bind 27 an S100A6-specific peptide. These results indicate that there are strong evolutionary 28
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.