We present a new atom type classification system for use in atom-based calculation of partition coefficient (log P) and molar refractivity (MR) designed in part to address published concerns of previous atomic methods. The 68 atomic contributions to log P have been determined by fitting an extensive training set of 9920 molecules, with r 2 ) 0.918 and σ ) 0.677. A separate set of 3412 molecules was used for the determination of contributions to MR with r 2 ) 0.997 and σ ) 1.43. Both calculations are shown to have high predictive ability.
In an earlier article' the need was demonstrated for atomic physicochemical properties for three dimensional structure directed quantitative structure-activity relationships, and it was shown how atomic parameters can be developed for successfully evaluating the molecular octanol-water partition coefficient, which is a measure of hydrophobicity. In this work we report more refined atomic values of octanol-water partition coefficients derived from nearly twice the number of compounds. Carbon, hydrogen, oxygen, nitrogen, sulfur and halogens are divided into 110 atom types of which 94 atomic values are evaluated from 830 molecules by least squares. These values gave a standard deviation of 0.470 and a correlation coefficient of 0.931. These parameters predicted the octanol-water partition coefficient of 125 compounds with a standard deviation of 0.520 and a correlation coefficient of 0.870. There is only a correlation coefficient of 0.432 between the atomic octanol-water partition coefficients and the atomic contributions to molar refractivity over the 93 atom types used for both the properties. This suggests that both parameters can be used simultaneously to model intermolecular interactions. We evaluated the CND0/2 gross atomic charge distribution over several molecules to check the validity of our classification. We found that the charge density on the heteroatoms in conjugated systems is strongly affected by the presence of similar atoms in the conjugation which suggests it should be incorporated as a separate parameter in evaluating the partition coefficient.
In the study of globular protein conformations, one customarily measures the similarity in three-dimensional structure by the root-mean-square deviation (RMSD) of the C alpha atomic coordinates after optimal rigid body superposition. Even when the two protein structures each consist of a single chain having the same number of residues so that the matching of C alpha atoms is obvious, it is not clear how to interpret the RMSD. A very large value means they are dissimilar, and zero means they are identical in conformation, but at what intermediate values are they particularly similar or clearly dissimilar? While many workers in the field have chosen arbitrary cutoffs, and others have judged values of RMSD according to the observed distribution of RMSD for random structures, we propose a self-referential, non-statistical standard. We take two conformers to be intrinsically similar if their RMSD is smaller than that when one of them is mirror inverted. Because the structures considered here are not arbitrary configurations of point atoms, but are compact, globular, polypeptide chains, our definition is closely related to similarity in radius of gyration and overall chain folding patterns. Being strongly similar in our sense implies that the radii of gyration must be nearly identical, the root-mean-square deviation in interatomic distances is linearly related to RMSD, and the two chains must have the same general fold. Only when the RMSD exceeds this level can parts of the polypeptide chain undergo nontrivial rearrangements while remaining globular. This enables us to judge when a prediction of a protein's conformation is "correct except for minor perturbations", or when the ensemble of protein structures deduced from NMR experiments are "basically in mutual agreement".
Protein structures are routinely compared by their root-mean-square deviation (RMSD) in atomic coordinates after optimal rigid body superposition. What is not so clear is the significance of different RMSD values, particularly above the customary arbitrary cutoff for obvious similarity of 2-3 A. Our earlier work argued for an intrinsic cutoff for protein similarity that varied with the number of residues in the polypeptide chains being compared. Here we introduce a new measure, rho, of structural similarity based on RMSD that is independent of the sizes of the molecules involved, or of any other special properties of molecules. When rho is less than 0.4-0.5, protein structures are visually recognized to be obviously similar, but the mathematically pleasing intrinsic cutoff of rho < 1.0 corresponds to overall similarity in folding motif at a level not usually recognized until smoothing of the polypeptide chain path makes it striking. When the structures are scaled to unit radius of gyration and equal principle moments of inertia, the comparisons are even more universal, since they are no longer obscured by differences in overall size and ellipticity. With increasing chain length, the distribution of rho for pairs of random structures is skewed to higher values, but the value for the best 1% of the comparisons rises only slowly with the number of residues. This level is close to an intrinsic cutoff between similar and dissimilar comparisons, namely the maximal scaled rho possible for the two structures to be more similar to each other than one is to the other's mirror image. The intrinsic cutoff is independent of the number of residues or points being compared. For proteins having fewer than 100 residues, the 1% rho falls below the intrinsic cutoff, so that for very small proteins, geometrically significant similarity can often occur by chance. We believe these ideas will be helpful in judging success in NMR structure determination and protein folding modeling.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.