Abstract:The degree of informatic independence between the physical properties of amino acids as encoded in actual protein sequences is calculated. It is shown that no physical property can be identified that carries significantly less information than others and that the information overlap between different properties and different length scales along the sequence is essentially zero. These observations suggest that bioinformatic models based on arbitrarily selected sets of physical properties are inherently deficien… Show more
“…The behavior summarized in Figure is qualitatively different from that observed when the 10 static property factors were compared in the same way . In that case, we found that all the significant sequence differences between architectural groups, at every level of the hierarchy, fall in the first bin (0 ≤ k ≤ 10).…”
Section: Resultscontrasting
confidence: 60%
“…If this is the case, we expect that groups of proteins that fold to different architectures will exhibit statistically significant differences in their dynamic propensities. We ask whether this is so, by addressing two specific questions:Are there indeed statistically significant differences in the characteristics of dynamic sequences between groups of proteins known to fold to different architectures?Is the behavior of those characteristics different from that we have observed for static properties?…”
Section: Resultsmentioning
confidence: 86%
“…We address the following goals:We propose an amino acid property that can act as an informatic basis for dynamic analysis.We investigate quantitatively the degree to which amino acid physical characteristics explain the values of this property.We use this property to develop a global dynamic representation of protein sequences, similar to that we developed in previous work for the static physical properties of protein sequences. This representation leads to a quantitative dynamic signature for the sequence.We determine the extent to which this representation encodes information previously unavailable.We use the new representation to investigate the dynamic differences between groups of proteins that fold to different architectures…”
We examine the local and global properties of the average B‐factor, 〈B〉, as a residue‐specific indicator of protein dynamic characteristics. It has been shown that values of 〈B〉 for the 20 amino acids differ in a statistically significant manner, and that, while strongly determined by the static physical properties of amino acids, they also encode averaged information about the influence of global fold on single‐residue dynamics. Therefore, complete sequences of amino acids also encode fold‐related global dynamic information, in addition to the local information that arises from static physical properties. We show that the relative magnitudes of these two contributions can be determined using Fourier methods, which represent the global properties of the sequences. It has also been shown that the behavior of Fourier components of 〈B〉 differs, with very high statistical significance, between structural groups, and that this information is not available from a comparable analysis of static amino acid properties.
“…The behavior summarized in Figure is qualitatively different from that observed when the 10 static property factors were compared in the same way . In that case, we found that all the significant sequence differences between architectural groups, at every level of the hierarchy, fall in the first bin (0 ≤ k ≤ 10).…”
Section: Resultscontrasting
confidence: 60%
“…If this is the case, we expect that groups of proteins that fold to different architectures will exhibit statistically significant differences in their dynamic propensities. We ask whether this is so, by addressing two specific questions:Are there indeed statistically significant differences in the characteristics of dynamic sequences between groups of proteins known to fold to different architectures?Is the behavior of those characteristics different from that we have observed for static properties?…”
Section: Resultsmentioning
confidence: 86%
“…We address the following goals:We propose an amino acid property that can act as an informatic basis for dynamic analysis.We investigate quantitatively the degree to which amino acid physical characteristics explain the values of this property.We use this property to develop a global dynamic representation of protein sequences, similar to that we developed in previous work for the static physical properties of protein sequences. This representation leads to a quantitative dynamic signature for the sequence.We determine the extent to which this representation encodes information previously unavailable.We use the new representation to investigate the dynamic differences between groups of proteins that fold to different architectures…”
We examine the local and global properties of the average B‐factor, 〈B〉, as a residue‐specific indicator of protein dynamic characteristics. It has been shown that values of 〈B〉 for the 20 amino acids differ in a statistically significant manner, and that, while strongly determined by the static physical properties of amino acids, they also encode averaged information about the influence of global fold on single‐residue dynamics. Therefore, complete sequences of amino acids also encode fold‐related global dynamic information, in addition to the local information that arises from static physical properties. We show that the relative magnitudes of these two contributions can be determined using Fourier methods, which represent the global properties of the sequences. It has also been shown that the behavior of Fourier components of 〈B〉 differs, with very high statistical significance, between structural groups, and that this information is not available from a comparable analysis of static amino acid properties.
“…This representation carries 86% of the variance of the entire set of available physical properties (45,46). We have further shown (48) that the representation cannot be simplified, because any deviation from the full property factor representation results in a loss of physical information. No factor encodes information about any of the others, and all encode roughly equal amounts.…”
We recently introduced a physically based approach to sequence comparison, the property factor method (PFM). In the present work, we apply the PFM approach to the study of a challenging set of sequences-the bacterial chemotaxis protein CheY, the N-terminal receiver domain of the nitrogen regulation protein NT-NtrC, and the sporulation response regulator Spo0F. These are all response regulators involved in signal transduction. Despite functional similarity and structural homology, they exhibit low sequence identity. PFM sequence comparison demonstrates a statistically significant qualitative difference between the sequence of CheY and those of the other two proteins that is not found using conventional alignment methods. This difference is shown to be consonant with structural characteristics, using distance matrix comparisons. We also demonstrate that residues participating strongly in native contacts during unfolding are distributed differently in CheY than in the other two proteins. The PFM result is also in accord with dynamic simulation results of several types. Molecular dynamics simulations of all three proteins were carried out at several temperatures, and it is shown that the dynamics of CheY are predicted to differ from those of NT-NtrC and Spo0F. The predicted dynamic properties of the three proteins are in good agreement with experimentally determined B factors and with fluctuations predicted by the Gaussian network model. We pinpoint the differences between the PFM and traditional sequence comparisons and discuss the informatic basis for the ability of the PFM approach to detect physical differences between these sequences that are not apparent from traditional alignment-based comparison.amino acid physical properties | protein fluctuations | all-atom simulations
“…In this context, ProtDCal is a software package that transforms protein sequences or 3D‐structures into general‐purpose numerical descriptors, accounting for both global and local information . Due to its complementary performance with respect to other well‐established tools in the field like PROFEAT and PseAcc (later extended to Pse‐in‐one), ProtDCal has been used in a number of studies . Notable among them are the modeling of posttranslational modifications, the prediction of protein enzymatic function, the prediction of antimicrobial activity in peptides, the determination of residues critical for protein function, and the prediction of stability changes upon mutations .…”
Computational tools for the analysis of protein data and the prediction of biological properties are essential in life sciences and biomedical research. Here, we introduce ProtDCal‐Suite, a web server comprising a set of machine learning‐based methods for studying proteins. The main module of ProtDCal‐Suite is the ProtDCal software. ProtDCal translates the structural information of proteins into numerical descriptors that serve as input to machine‐learning techniques. The ProtDCal‐Suite server also incorporates a post‐processing optional stage that allows ranking and filtering the obtained descriptors by computing their Shannon entropy values across the input set of proteins. ProtDCal's codification was used in the development of models for the prediction of specific protein properties. Thus, the other modules of ProtDCal‐Suite are protein analysis tools implemented using ProtDCal's descriptors. Among them are PPI‐Detect, for predicting the interaction likelihood of protein–protein and protein–peptide pairs, Enzyme Identifier, for identifying enzymes from amino acid sequences or 3D structures, and Pred‐NGlyco, for predicting N‐glycosylation sites. ProtDCal‐Suite is freely accessible at https://protdcal.zmb.uni‐due.de.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.