Determination of the tendencies of amino acids to form alpha-helical and beta-sheet structures has been important in clarifying stabilizing interactions, protein design, and the protein folding problem. In this study, we have determined for the first time a complete scale of amino acid propensities for another important protein motif: the collagen triple-helix conformation with its Gly-X-Y repeating sequence. Guest triplets of the form Gly-X-Hyp and Gly-Pro-Y are used to quantitate the conformational propensities of all 20 amino acids for the X and Y positions in the context of a (Gly-Pro-Hyp)(8) host peptide. The rankings for both the X and Y positions show the highly stabilizing nature of imino acids and the destabilizing effects of Gly and aromatic residues. Many residues show differing propensities in the X versus Y position, related to the nonequivalence of these positions in terms of interchain interactions and solvent exposure. The propensity of amino acids to adopt a polyproline II-like conformation plays a role in their triple-helix rankings, as shown by a moderate correlation of triple-helix propensity with frequency of occurrence in polyproline II-like regions. The high propensity of ionizable residues in the X position suggests the importance of interchain hydrogen bonding directly or through water to backbone carbonyls or hydroxyprolines. The low propensity of side chains with branching at the C(delta) in the Y position supports models suggesting these groups block solvent access to backbone C=O groups. These data provide a first step in defining sequence-dependent variations in local triple-helix stability and binding, and are important for a general understanding of side chain interactions in all proteins.
An algorithm was derived to relate the amino acid sequence of a collagen triple helix to its thermal stability. This calculation is based on the triple helical stabilization propensities of individual residues and their intermolecular and intramolecular interactions, as quantitated by melting temperature values of host-guest peptides. Experimental melting temperature values of a number of triple helical peptides of varying length and sequence were successfully predicted by this algorithm. However, predicted T m values are significantly higher than experimental values when there are strings of oppositely charged residues or concentrations of like charges near the terminus. Application of the algorithm to collagen sequences highlights regions of unusually high or low stability, and these regions often correlate with biologically significant features. The prediction of stability from sequence indicates an understanding of the major forces maintaining this protein motif. The use of highly favorable KGE and KGD sequences is seen to complement the stabilizing effects of imino acids in modulating stability and may become dominant in the collagenous domains of bacterial proteins that lack hydroxyproline. The effect of single amino acid mutations in the X and Y positions can be evaluated with this algorithm. An interactive collagen stability calculator based on this algorithm is available online.The ability to predict structure and stability from amino acid sequence is an important step in the understanding of basic protein principles and the structural consequences of pathological mutations. The vast number of amino acid sequences available from DNA data contrasts with the smaller number of high resolution protein structures and the limited experimental data on protein stability. The ability to make predictions that are in good agreement with experimental data provides insight into the stabilizing interactions within proteins. In addition, there is much interest in computing the effect of single amino acid replacements on protein stability because destabilizing effects are associated with deleterious mutations that result in clinically detectable phenotypes (1-3). In contrast to globular proteins, the relation among sequence, structure, and stability is simpler and better defined for the linear collagen triple helix.The collagen triple helix motif is found widely in structural proteins of the extracellular matrix and in an increasing set of non-collagenous proteins, many of which are involved in host-defense functions (4, 5). The close packing of three supercoiled polyproline II-like polypeptide chains in the collagen triple helix generates a requirement for Gly as every third residue (6 -8). The observation of such a repeating (Gly-X-Y) n sequence pattern over a stretch of residues signifies a triple helix conformation. However, the collagen triple helix is not uniform in structure or stability. Crystal structures of collagen peptides show that variation in amino acid content leads to small but significant variations i...
Proteins with sequence-specific DNA binding function are important for a wide range of biological activities. De novo prediction of their DNA-binding specificities from sequence alone would be a great aid in inferring cellular networks. Here we introduce a method for predicting DNA-binding specificities for Cys2His2 zinc fingers (C2H2-ZFs), the largest family of DNA-binding proteins in metazoans. We develop a general approach, based on empirical calculations of pairwise amino acid–nucleotide interaction energies, for predicting position weight matrices (PWMs) representing DNA-binding specificities for C2H2-ZF proteins. We predict DNA-binding specificities on a per-finger basis and merge predictions for C2H2-ZF domains that are arrayed within sequences. We test our approach on a diverse set of natural C2H2-ZF proteins with known binding specificities and demonstrate that for >85% of the proteins, their predicted PWMs are accurate in 50% of their nucleotide positions. For proteins with several zinc finger isoforms, we show via case studies that this level of accuracy enables us to match isoforms with their known DNA-binding specificities. A web server for predicting a PWM given a protein containing C2H2-ZF domains is available online at http://zf.princeton.edu and can be used to aid in protein engineering applications and in genome-wide searches for transcription factor targets.
Important stabilizing features for the collagen triple helix include the presence of Gly as every third residue, a high content of imino acids, and interchain hydrogen bonds. Host-guest peptides have been used previously to characterize triple-helix propensities of individual residues and Gly-X-Y triplets. Here, comparison of the thermal stabilities of host-guest peptides of the form (Gly-Pro-Hyp)3-Gly-X-Y-Gly-X'-Y'-(Gly-Pro-Hyp)3 extends the study to adjacent tripeptide sequences, to encompass the major classes of potential direct intramolecular interactions. Favorable hydrophobic interactions were observed, as well as stabilizing intrachain interactions between residues of opposite charge in the i and i + 3 positions. However, the greatest gain in triple-helix stability was achieved in the presence of Gly-Pro-Lys-Gly-Asp/Glu-Hyp sequences, leading to a T(m) value equal to that seen for a Gly-Pro-Hyp-Gly-Pro-Hyp sequence. This stabilization is seen for Lys but not for Arg and can be assigned to interchain ion pairs, as shown by molecular modeling. Computational analysis shows that Lys-Gly-Asp/Glu sequences are present at a frequency much greater than expected in collagen, suggesting this interaction is biologically important. These results add significantly to the understanding of which surface ion pairs can contribute to protein stability.
The folding of collagen in vitro is very slow and presents difficulties in reaching equilibrium, a feature that may have implications for in vivo collagen function. Peptides serve as good model systems for examining equilibrium thermal transitions in the collagen triple helix. Investigations were carried out to ascertain whether a range of synthetic triple-helical peptides of varying sequences can reach equilibrium, and whether the triple helix to unfolded monomer transition approximates a two-state model. The thermal transitions for all peptides studied are fully reversible given sufficient time. Isothermal experiments were carried out to obtain relaxation times at different temperatures. The slowest relaxation times, on the order of 10-15 h, were observed at the beginning of transitions, and were shown to result from self-association limited by the low concentration of free monomers, rather than cis-trans isomerization. Although the fit of the CD equilibrium transition curves and the concentration dependence of T m values support a two-state model, the more rigorous comparison of the calorimetric enthalpy to the van't Hoff enthalpy indicates the two-state approximation is not ideal. Previous reports of melting curves of triple-helical host-guest peptides are shown to be a two-state kinetic transition, rather than an equilibrium transition.Keywords: collagen; triple helix; peptide; equilibrium; thermodynamics; two-state model; relaxationThe folding of multimeric fibrous proteins is complicated, in contrast with the classic experiments of Anfinsen, that demonstrated reversible refolding of monomer globular proteins (Anfinsen 1973;Jaenicke and Lilie 2000). Many small globular proteins undergo very fast unfolding and refolding, leading to rapid establishment of equilibrium. When these conditions apply and only the native and unfolded states are significantly populated, thermodynamic analysis for a twostate model can be applied to clarify stabilizing interactions.Fibrous proteins are more complicated because of their multichain nature, their length, and the linear nature of their superhelical motifs (Beck and Brodsky 1998). The coiledcoil ␣-helical domain of tropomyosin consists of multiple cooperative units of different stability, and the triple-helical region of collagen molecule contains multiple cooperative units with similar stability (Privalov 1982). Although the coiled-coil tropomyosin molecule reaches equilibrium very quickly, the reversibility of transitions of the collagen triple helix is controversial (Miles 1993;Engel and Bächinger 2000;Bächinger and Engel 2001). The nature of the collagen triple helix to unfolded monomer transition is approached here through studies on collagen-like model peptides.The collagen triple helix has unique sequence and conformational features that could influence folding and unfolding, as well as the ability to reach equilibrium. The family of collagens includes at least 27 distinct genetic types that are united by their common triple-helix motif (Kielty and Grant 20...
Interest in self-association of peptides and proteins is motivated by an interest in the mechanism of physiologically higher order assembly of proteins such as collagen as well as the mechanism of pathological aggregation such as -amyloid formation. The triple helical form of (Pro-Hyp-Gly) 10 , a peptide that has proved a useful model for molecular features of collagen, was found to self-associate, and its association properties are reported here. Turbidity experiments indicate that the triple helical peptide self-assembles at neutral pH via a nucleationgrowth mechanism, with a critical concentration near 1 mM. The associated form is more stable than individual molecules by about 25°C, and the association is reversible. The rate of self-association increases with temperature, supporting an entropically favored process. After self-association, (ProHyp-Gly) 10 forms branched filamentous structures, in contrast with the highly ordered axially periodic structure of collagen fibrils. Yet a number of characteristics of triple helix assembly for the peptide resemble those of collagen fibril formation. These include promotion of fibril formation by neutral pH and increasing temperature; inhibition by sugars; and a requirement for hydroxyproline. It is suggested that these similar features for peptide and collagen self-association are based on common lateral underlying interactions between triple helical molecules mediated by hydrogen-bonded hydration networks involving hydroxyproline.There is increasing interest in the ability of proteins and peptides to self-associate into aggregates, both in normal and pathological processes. Normal self-association processes include fibril formation of collagen and polymerization of actin (1, 2), whereas pathological aggregation of amyloid peptides, ␣-synuclein, and prions is implicated in neurodegenerative diseases (3,4). Interest has focused on the nature of protein aggregation and the molecular and environmental determinants of the self-association process. The study of the ability of collagen-like peptides to aggregate offers an opportunity to characterize a unique system, which may relate to the physiological self-association of collagen molecules.Collagen, the major structural protein in the extracellular matrix, has a characteristic triple helical conformation, consisting of three polyproline II-like chains that are supercoiled around a common axis (5-7). The close packing of the three chains near the central axis generates a requirement for Gly as every third residue, (Gly-X-Y) n , whereas the high content of imino acids Pro and hydroxyproline (Hyp) stabilizes the individual polyproline II-like helices. Although imino acids are highly favorable for the triple helix, the post-translational modification of Pro to Hyp in the Y position confers an additional stabilizing contribution. This further stabilization of Hyp is likely to result from steroelectronic promotion of the more favorable exo ring pucker for the Y position and Hyp involvement in solvent-mediated hydrogen bonding...
An online tool for predicting ZF DNA binding is available at http://compbio.cs.princeton.edu/zf/.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.