Effective interresidue contact energies for proteins in solution are estimated from the numbers of residue-residue contacts observed in crystal structures of globular proteins by means of the quasi-chemical approximation with an approximate treatment of the effects of chain connectivity. Employing a lattice model, each residue of a protein is assumed to occupy a site in a lattice and vacant sites are regarded to be occupied by an effective solvent molecule whose size is equal to the average size of a residue. A basic assumption is that the average characteristics of residue-residue contacts formed in a large number of protein crystal structures reflect actual differences of interactions among residues, as if there were no significant contribution from the specific amino acid sequence in each protein as well as intraresidue and short-range interactions. Then, taking account of the effects of the chain connectivity only as imposing a limit to the size of the system, i.e., the number of lattice sites or the number of effective solvent molecules in the system, the system is regarded to be the mixture of unconnected residues and effective solvent molecules. The quasi-chemical approximation, that contact pair formation resembles a chemical reaction, is applied to this system to obtain formulas that relate the statistical averages of the numbers of contacts to the contact energies. The number of effective solvent molecules for each protein is chosen to yield the total number of residue-residue contacts equal to its expected value for the hypothetical case of hard sphere interactions among residues and effective solvent molecules; the expected number of residue-residue contacts at this condition has been crudely estimated by means of a freely jointed chain distribution and an expansion originating in hard sphere interactions. Each residue is represented by the center of its side chain atom positions, and contacts among residues and effective solvent molecules are defined to be those pairs within 6.5 Á, a distance that has been chosen on the basis of the observed radial distribution of residues; nearest-neighbor pairs along a chain are explicitly excluded in counting contacts. Coordination numbers, for each type of residue as well as for solvent molecules, are estimated from the mean volume of each type of residue and used to evaluate the numbers of residue-solvent and solvent-solvent contacts from the numbers of residue-residue contacts. The estimated values of contact energies have reasonable residue-type dependences, reflecting residue distributions in protein crystals; nonpolar-residue-in and polar-residue-out are seen as well as the segregation of those residue groups. In addition, there is a linear relationship between the average contact energies for nonpolar residues and their hydrophobicities reported by Nozaki and Tanford; however, the magnitudes on average are about twice as large. The relevance of results to protein folding and other applications are discussed.
The frequency of amino acid substitutions, relative to the frequency expected by chance, decreases linearly with the increase in physico-chemical differences between amino acid pairs involved in a substitution. This correlation does not apply to abnormal human hemoglobins. Since abnormal hemoglobins mostly reflect the process of mutation rather than selection, the correlation manifest during protein evolution between substitution frequency and physico-chemical difference in amino acids can be attributed to natural selection. Outside of 'abnormal' proteins, the correlation also does not apply to certain regions of proteins characterized by rapid rates of substitution. In these cases again, except for the largest physico-chemical differences between amino acid pairs, the substitution frequencies seem to be independent of the physico-chemical parameters. The limination of the substituents involving the largest physico-chemical differences can once more be attributed to natural selection. For smaller physico-chemical differences, natural selection, if it is operating in the polypeptide regions, must be based on parameters other than those examined.
Lotus tetragonolobus agglutinin (LTA) binds preferentially to early embryonic cells in the mouse. The affinity-purified antibody raised against LTA receptors from embryonal carcinoma cells were used to screen a lambda gt11 expression library of F9 embryonal carcinoma cells, resulting in detection of a cDNA clone specifying a new glycoprotein termed "basigin." The glycoprotein has been suggested to be a transmembrane one, and was found to be a new member of the immunoglobulin (Ig) superfamily. The molecular weight of basigin was largely in the range between 43,000 and 66,000, while that of the peptide portion with a putative signal sequence was inferred to be about 30,000. Significant levels of basigin mRNA were detected not only in embryonal carcinoma cells, but also in mouse embryos at 9-15 days of gestation and in various organs of the adult mouse. The Ig-like domain of basigin is unique, since it has strong homology to both the beta-chain of major histocompatibility class II antigen and the Ig V domain. The number of amino acids between the two conserved cysteine residues is intermediate between those of the Ig V and C domains. Therefore, basigin is an interesting protein in connection with the molecular evolution of the superfamily.
Pairwise contact energies for 20 types of residues are estimated self-consistently from the actual observed frequencies of contacts with regression coefficients that are obtained by comparing "input" and predicted values with the Bethe approximation for the equilibrium mixtures of residues interacting. This is premised on the fact that correlations between the "input" and the predicted values are sufficiently high although the regression coefficients themselves can depend to some extent on protein structures as well as interaction strengths. Residue coordination numbers are optimized to obtain the best correlation between "input" and predicted values for the partition energies. The contact energies self-consistently estimated this way indicate that the partition energies predicted with the Bethe approximation should be reduced by a factor of about 0.3 and the intrinsic pairwise energies by a factor of about 0.6. The observed distribution of contacts can be approximated with a small relative error of only about 0.08 as an equilibrium mixture of residues, if many proteins were employed to collect more than 20,000 contacts. Including repulsive packing interactions and secondary structure interactions further reduces the relative errors. These new contact energies are demonstrated by threading to have improved their ability to discriminate native structures from other non-native folds.
Pairwise contact energies for 20 types of residues are estimated self-consistently from the actual observed frequencies of contacts with regression coefficients that are obtained by comparing "input" and predicted values with the Bethe approximation for the equilibrium mixtures of residues interacting. This is premised on the fact that correlations between the "input" and the predicted values are sufficiently high although the regression coefficients themselves can depend to some extent on protein structures as well as interaction strengths. Residue coordination numbers are optimized to obtain the best correlation between "input" and predicted values for the partition energies. The contact energies self-consistently estimated this way indicate that the partition energies predicted with the Bethe approximation should be reduced by a factor of about 0.3 and the intrinsic pairwise energies by a factor of about 0.6. The observed distribution of contacts can be approximated with a small relative error of only about 0.08 as an equilibrium mixture of residues, if many proteins were employed to collect more than 20,000 contacts. Including repulsive packing interactions and secondary structure interactions further reduces the relative errors. These new contact energies are demonstrated by threading to have improved their ability to discriminate native structures from other non-native folds.
Probabilities of all possible correspondences of residues in aligning two proteins are evaluated by assuming that the statistical weight of each alignment is proportional to the exponent of its total similarity score. Based on such probabilities, a probability alignment that includes the most probable correspondences is proposed. In the case of highly similar sequence pairs, the probability alignments agree with the maximum similarity alignments that correspond to the alignments with the maximum similarity score. Significant correspondences in the probability alignments are those whose probabilities are > 0.5. The probability alignment method is applied to a few protein pairs, and results indicate that such highly probable correspondences in the probability alignments are probably correct correspondences that agree with the structural alignments and that incorrect correspondences in the maximum similarity alignments are usually insignificant correspondences in the probability alignments. The root mean square deviations in superimposition of corresponding residues tend to be smaller for significant correspondences in the probability alignments than for all correspondences in the maximum similarity alignments, indicating that incorrect correspondences in the maximum similarity alignments tend to be insignificant correspondences in probability alignments. This fact is also confirmed in 109 protein pairs that are similar to each other with sequence identities between 90 and 35%. In addition, the probability alignment method may better predict correct correspondences than the maximum similarity alignment method. Probability alignments do, of course, depend on a scoring scheme but are less sensitive to the value of parameters such as gap penalties. The present probability alignment method is useful for constructing reliable alignments based on the probabilities of correspondences and can be used with any scoring scheme.
In mice, 12 germ-line DH genes belonging to three different families (DQ52, DSP2 and DFL16) have been identified. The DH genes other than DQ52 are clustered in the 60 kb-long region located between VH and JH genes. Since there are seven DH gene families (DHQ52, DXP, DA, DK, DN, DM and DLR) in humans, we tried to identify new DH gene families in the 60 kb-long region using human DH gene probes. Mouse and human DH genes showing the highest similarity were mouse DFL16 genes and human DA genes. Southern hybridization of the mouse clones covering the 60-kb region with human DH probes did not detect any other DH genes. Nucleotide sequence analysis of the 4.0-kb fragment containing the DFL16.1 gene confirmed this conclusion. Comparison of the 12 germ-line DH genes and more than 150 somatic DH sequences also indicated that there are not more germ-line DH genes in the mouse genome. Moreover, comparison of nucleotide sequences of DFL16.1 and DSP2.2 genes and their surrounding regions suggests that both DH gene families originate from the same primordial DH gene. Using the flanking sequences of both DH genes, the divergence date between DFL16 and DSP2 genes was estimated at around 37 million years ago.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.