The study and prediction of kinase function (kinomics) is of major importance for proteome research due to the widespread distribution of kinases. However, the prediction of protein function based on the similarity between a functionally annotated 3D template and a query structure may fail, for instance, if a similar protein structure cannot be identified. Alternatively, function can be assigned using 3D-structural empirical parameters. In previous studies, we introduced parameters based on electrostatic entropy (Proteins 2004, 56, 715) and molecular vibration entropy (Bioinformatics 2003, 19, 2079) but ignored other important factors such as van der Waals (vdw) interactions. In the work described here, we define 3D-vdw entropies (degrees theta(k)) and use them for the first time to derive a classifier for protein kinases. The model classifies correctly 88.0% of proteins in training and more than 85.0% of proteins in validation studies. Principal components analysis of heterogeneous proteins demonstrated that degrees theta(k) codify information that is different to that described by other bulk or folding parameters. In additional validation experiments, the model recognized 129 out of 142 kinases (90.8%) and 592 out of 677 non-kinases (87.4%) not used above. This study provides a basis for further consideration of degrees theta(k) as parameters for the empirical search for structure-function relationships.
Variable selection is a procedure used to select the most important features to obtain as much information as possible from a reduced amount of features. The selection stage is crucial. The subsequent design of a quantitative structure-activity relationship (QSAR) model (regression or discriminant) would lead to poor performance if little significant features are selected. In drug design modern era, by the means of combinatorial chemistry and high throughput screening, an unprecedented amount of experimental information has been generated. In addition, many molecular descriptors have been defined in the last two decays. All this information can be analyzed by QSAR techniques using adequate statistical procedures. These techniques and procedures should be fast, automated, and applicable to large data sets of structurally diverse compounds. For that reason, the identification of the best one seems to be a very difficult task in view of the large variable selection techniques existing nowadays. The intention of this review is to summarize some of the present knowledge concerning to variable selection methods applied to some well-known statistical techniques such as linear regression, PLS, kNN, Artificial Neural Networks, etc, with the aim to disseminate the advances of this important stage of the QSAR building model.
Three-dimensional (3D) protein structures now frequently lack functional annotations because of the increase in the rate at which chemical structures are solved with respect to experimental knowledge of biological activity. As a result, predicting structure-function relationships for proteins is an active research field in computational chemistry and has implications in medicinal chemistry, biochemistry and proteomics. In previous studies stochastic spectral moments were used to predict protein stability or function (González-Díaz, H. et al. Bioorg Med Chem 2005, 13, 323; Biopolymers 2005, 77, 296). Nevertheless, these moments take into consideration only electrostatic interactions and ignore other important factors such as van der Waals interactions. The present study introduces a new class of 3D structure molecular descriptors for folded proteins named the stochastic van der Waals spectral moments ((o)beta(k)). Among many possible applications, recognition of kinases was selected due to the fact that previous computational chemistry studies in this area have not been reported, despite the widespread distribution of kinases. The best linear model found was Kact = -9.44 degrees beta(0)(c) +10.94 degrees beta(5)(c) -2.40 degrees beta(0)(i) + 2.45 degrees beta(5)(m) + 0.73, where core (c), inner (i) and middle (m) refer to specific spatial protein regions. The model with a high Matthew's regression coefficient (0.79) correctly classified 206 out of 230 proteins (89.6%) including both training and predicting series. An area under the ROC curve of 0.94 differentiates our model from a random classifier. A subsequent principal components analysis of 152 heterogeneous proteins demonstrated that beta(k) codifies information different to other descriptors used in protein computational chemistry studies. Finally, the model recognizes 110 out of 125 kinases (88.0%) in a virtual screening experiment and this can be considered as an additional validation study (these proteins were not used in training or predicting series).
Twenty-three clovane derivatives, nine described here for the first time, bearing substituents on carbon C-2, have been synthesized and evaluated for their in vitro antifungal activity against the phytopathogenic fungus Botrytis cinerea. The results showed that compounds 9, 14, 16, and 18 bearing nitrogen atoms in the chain attached at C-2 displayed potent antifungal activity, whereas mercapto derivatives 13, 19, and 22 displayed low activity. The antifungal activity showed a clear structure-activity relationship (SAR) trend, which confirmed the importance of the nature of the C-2 chain on the antifungal activity. On the basis of these observations, the metabolism of compounds 8 and 14 by the fungus B. cinerea, and the metabolism of other clovanes by this fungus, described previously, a pro-drug action mechanism for 2-alkoxyclovane compounds is proposed. Quantitative structure-activity relationship (QSAR) studies were performed to rationalize the results and to suggest further optimization, using a topological sub-structural molecular design (TOPS-MODE) approach. The model displayed good fit and predictive capability, describing 85.5% of the experimental variance, with a standard deviation of 9.502 and yielding high values of cross-validation determination coefficients (q2CV-LOO = 0.784 and q2boot = 0.673). The most significant variables were the spectral moments weighted by bond dipole moment (Dip), hydrophobicity (Hyd), and the combined dipolarity/polarizability Abraham molecular descriptor (Ab-pi2H).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.