In order to identify the amino acids that determine protein structure and function it is useful to rank them by their relative importance. Previous approaches belong to two groups; those that rely on statistical inference, and those that focus on phylogenetic analysis. Here, we introduce a class of hybrid methods that combine evolutionary and entropic information from multiple sequence alignments. A detailed analysis in insulin receptor kinase domain and tests on proteins that are well-characterized experimentally show the hybrids' greater robustness with respect to the input choice of sequences, as well as improved sensitivity and specificity of prediction. This is a further step toward proteome scale analysis of protein structure and function.
We applied support vector machines to sequences in order to generate a classification of all protein residues into those that are part of a protein interface and those that are not. For the first time evolutionary information was used as one of the attributes and this inclusion of evolutionary importance rankings improves the classification. Leave-one-out cross-validation experiments show that prediction accuracy reaches 64%.
: Evolutionary trace report_maker offers a new type of service for researchers investigating the function of novel proteins. It pools, from different sources, information about protein sequence, structure and elementary annotation, and to that background superimposes inference about the evolutionary behavior of individual residues, using real-valued evolutionary trace method. As its only input it takes a Protein Data Bank identifier or UniProt accession number, and returns a human-readable document in PDF format, supplemented by the original data needed to reproduce the results quoted in the report.
Starting from the hypothesis that evolutionarily important residues form a spatially limited cluster in a protein's native fold, we discuss the possibility of detecting a non-native structure based on the absence of such clustering. The relevant residues are determined using the Evolutionary Trace method. We propose a quantity to measure clustering of the selected residues on the structure and show that the exact values for its average and variance over several ensembles of interest can be found. This enables us to study the behavior of the associated z-scores. Since our approach rests on an analytic result, it proves to be general, customizable, and computationally fast. We find that clustering is indeed detectable in a large representative protein set. Furthermore, we show that non-native structures tend to achieve lower residue-clustering z-scores than those attained by the native folds. The most important conclusion that we draw from this work is that consistency between structural and evolutionary information, manifested in clustering of key residues, imposes powerful constraints on the conformational space of a protein.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.