AAindex is a database of numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids. We have added a collection of protein contact potentials to the AAindex as a new section. Accordingly AAindex consists of three sections now: AAindex1 for the amino acid index of 20 numerical values, AAindex2 for the amino acid substitution matrix and AAindex3 for the statistical protein contact potentials. All data are derived from published literature. The database can be accessed through the DBGET/LinkDB system at GenomeNet (http://www.genome.jp/dbget-bin/www_bfind?aaindex) or downloaded by anonymous FTP (ftp://ftp.genome.jp/pub/db/community/aaindex/).
We have analyzed 29 different published matrices of protein pairwise contact potentials (CPs) between amino acids derived from different sets of proteins, either crystallographic structures taken from the Protein Data Bank (PDB) or computer-generated decoys. Each of the CPs is similar to 1 of the 2 matrices derived in the work of Miyazawa and Jernigan (Proteins 1999;34:49-68). The CP matrices of the first class can be approximated with a correlation of order 0.9 by the formula e ij = h i + h j , 1 ≤ i, j ≤ 20, where the residue-type dependent factor h is highly correlated with the frequency of occurrence of a given amino acid type inside proteins. Electrostatic interactions for the potentials of this class are almost negligible. In the potentials belonging to this class, the major contribution to the potentials is the one-body transfer energy of the amino acid from water to the protein environment. Potentials belonging to the second class can be approximated with a correlation of 0.9 by the formula e ij = c 0 − h i h j + q i q j , where c 0 is a constant, h is highly correlated with the Kyte-Doolittle hydrophobicity scale, and a new, less dominant, residue-type dependent factor q is correlated (~0.9) with amino acid isoelectric points pI. Including electrostatic interactions significantly improves the approximation for this class of potentials. While, the high correlation between potentials of the first class and the hydrophobic transfer energies is well known, the fact that this approximation can work well also for the second class of potentials is a new finding. We interpret potentials of this class as representing energies of contact of amino acid pairs within an average protein environment. Proteins 2005;59:49-57.
Subependymal giant cell astrocytomas (SEGAs) are rare brain tumors associated with tuberous sclerosis complex (TSC), a disease caused by mutations in TSC1 or TSC2, resulting in enhancement of mammalian target of rapamycin (mTOR) activity, dysregulation of cell growth, and tumorigenesis. Signaling via mTOR plays a role in multifaceted genomic responses, but its effectors in the brain are largely unknown. Therefore, gene expression profiling on four SEGAs was performed with Affymetrix Human Genome arrays. Of the genes differentially expressed in TSC, 11 were validated by real-time PCR on independent tumor samples and 3 SEGA-derived cultures. Expression of several proteins was confirmed by immunohistochemistry. The differentially-regulated proteins were mainly involved in tumorigenesis and nervous system development. ANXA1, GPNMB, LTF, RND3, S100A11, SFRP4, and NPTX1 genes were likely to be mTOR effector genes in SEGA, as their expression was modulated by an mTOR inhibitor, rapamycin, in SEGA-derived cells. Inhibition of mTOR signaling affected size of cultured SEGA cells but had no influence on their proliferation, morphology, or migration, whereas inhibition of both mTOR and extracellular signal-regulated kinase signaling pathways led to significant alterations of these processes. For the first time, we identified genes related to the occurrence of SEGA and regulated by mTOR and demonstrated an effective modulation of SEGA growth by pharmacological inhibition of both mTOR and extracellular signal-regulated kinase signaling pathways, which could represent a novel therapeutic approach.
A simple protein model restricted to the face-centered cubic lattice has been studied. The model interaction scheme includes attractive interactions between hydrophobic (H) residues, repulsive interactions between hydrophobic and polar (P) residues, and orientation-dependent P-P interactions. Additionally, there is a potential that favors extended beta-type conformations. A sequence has been designed that adopts a native structure, consisting of an antiparallel, six-member Greek-key beta-barrel with protein-like structural degeneracy. It has been shown that the proposed model is a minimal one, i.e., all the above listed types of interactions are necessary for cooperative (all-or-none) type folding to the native state. Simulations were performed via the Replica Exchange Monte Carlo method and the numerical data analyzed via a multihistogram method.
Much structural information is encoded in the internal distances; a distance matrix-based approach can be used to predict protein structure and dynamics, and for structural refinement. Our approach is based on the square distance matrix D = [rij2] containing all square distances between residues in proteins. This distance matrix contains more information than the contact matrix C, that has elements of either 0 or 1 depending on whether the distance rij is greater or less than a cutoff value rcutoff .We have performed spectral decomposition of the distance matrices D=∑λkVkVkT , in terms of eigenvalues λk and the corresponding eigenvectors vk and found that it contains at most 5 nonzero terms. A dominant eigenvector is proportional to r2 - the square distance of points from the center of mass, with the next three being the principal components of the system of points. By knowing r2 we can approximate a distance matrix of a protein with an expected RMSD value of about 4.5Å. We can also explain the role of hydrophobic interactions for the protein structure, because r is highly correlated with the hydrophobic profile of the sequence. Moreover, r is highly correlated with several sequence profiles which are useful in protein structure prediction, such as contact number, the residue-wise contact order (RWCO) or mean square fluctuations (i.e. crystallographic temperature factors). We have also shown that the next three components are related to spatial directionality of the secondary structure elements, and they may be also predicted from the sequence, improving overall structure prediction. We have also shown that the large number of available HIV-1 protease structures provides a remarkable sampling of conformations, which can be viewed as direct structural information about the dynamics. After structure matching, we apply principal component analysis (PCA) to obtain the important apparent motions for both bound and unbound structures. There are significant similarities between the first few key motions and the first few low-frequency normal modes calculated from a static representative structure with an elastic network model (ENM) that is based on the contact matrix C (related to D), strongly suggesting that the variations among the observed structures and the corresponding conformational changes are facilitated by the low-frequency, global motions intrinsic to the structure. Similarities are also found when the approach is applied to an NMR ensemble, as well as to atomic molecular dynamics (MD) trajectories. Thus, a sufficiently large number of experimental structures can directly provide important information about protein dynamics, but ENM can also provide a similar sampling of conformations. Finally, we use distance constraints from databases of known protein structures for structure refinement. We use the distributions of distances of various types in known protein structures to obtain the most probable ranges or the mean-force potentials for the distances. We then impose these constraints on structures to be ref...
The standard Markov chain Monte Carlo method of estimating an expected value is to generate a Markov chain which converges to the target distribution and then compute correlated sample averages. In many applications the quantity of interest θ is represented as a product of expected values, θ = µ 1 · · · µ k , and a natural estimator is a product of averages. To increase the confidence level, we can compute a median of independent runs. The goal of this paper is to analyze such an estimatorθ, i.e. an estimator which is a 'median of products of averages' (MPA). Sufficient conditions are given forθ to have fixed relative precision at a given level of confidence, that is, to satisfy P(|θ − θ| ≤ θε) ≥ 1 − α. Our main tool is a new bound on the mean-square error, valid also for nonreversible Markov chains on a finite state space.
BackgroundAffymetrix GeneChip microarrays are popular platforms for expression profiling in two types of studies: detection of differential expression computed by p-values of t-test and estimation of fold change between analyzed groups. There are many different preprocessing algorithms for summarizing Affymetrix data. The main goal of these methods is to remove effects of non-specific hybridization, and to optimally combine information from multiple probes annotated to the same transcript. The methods are benchmarked by comparison with reference methods, such as quantitative reverse-transcription PCR (qRT-PCR).ResultsWe present a comprehensive analysis of agreement between Affymetrix GeneChip and qRT-PCR results. We analyzed the influence of filtering by fraction Present calls introduced by J.N. McClintick and H.J. Edenberg (2006) and 2 mapping procedures: updated probe sets definitions proposed by Dai et al. (2005) and our "naive mapping" method. Because of evolution of genome sequence annotations since the time when microarrays were designed, we also studied the effect of the annotation release date. These comparisons were prepared for 6 popular preprocessing algorithms (MAS5, PLIER, RMA, GC-RMA, MBEI, and MBEImm) in the 2 above-mentioned types of studies. We used data sets from 6 independent biological experiments. As a measure of reproducibility of microarray and qRT-PCR values, we used linear and rank correlation coefficients.ConclusionsWe show that filtering by fraction Present calls increased correlations for all 6 preprocessing algorithms. We observed the difference in performance of PM-MM and PM-only methods: using MM probes increased correlations in fold change studies, but PM-only methods proved to perform better in detection of differential expression. We recommend using GC-RMA for detection of differential expression and PLIER for estimation of fold change. The use of the more recent annotation improves the results in both types of studies, encouraging re-analysis of old data.
The standard Markov chain Monte Carlo method of estimating an expected value is to generate a Markov chain which converges to the target distribution and then compute correlated sample averages. In many applications the quantity of interest θ is represented as a product of expected values, θ = µ 1 ⋯ µ k , and a natural estimator is a product of averages. To increase the confidence level, we can compute a median of independent runs. The goal of this paper is to analyze such an estimator , i.e. an estimator which is a ‘median of products of averages’ (MPA). Sufficient conditions are given for to have fixed relative precision at a given level of confidence, that is, to satisfy . Our main tool is a new bound on the mean-square error, valid also for nonreversible Markov chains on a finite state space.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.