phyre2 is a suite of tools available on the web to predict and analyze protein structure, function and mutations. the focus of phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools. phyre2 replaces phyre, the original version of the server for which we previously published a paper in Nature Protocols. In this updated protocol, we describe phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants (e.g., nonsynonymous snps (nssnps)) for a user's protein sequence. users are guided through results by a simple interface at a level of detail they determine. this protocol will guide users from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. a range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins that are difficult to model. the server is available at http://www.sbg. bio.ic.ac.uk/phyre2. a typical structure prediction will be returned between 30 min and 2 h after submission.
Determining the structure and function of a novel protein is a cornerstone of many aspects of modern biology. Over the past decades, a number of computational tools for structure prediction have been developed. It is critical that the biological community is aware of such tools and is able to interpret their results in an informed way. This protocol provides a guide to interpreting the output of structure prediction servers in general and one such tool in particular, the protein homology/analogy recognition engine (Phyre). New profile-profile matching algorithms have improved structure prediction considerably in recent years. Although the performance of Phyre is typical of many structure prediction systems using such algorithms, all these systems can reliably detect up to twice as many remote homologies as standard sequence-profile searching. Phyre is widely used by the biological community, with 4150 submissions per day, and provides a simple interface to results. Phyre takes 30 min to predict the structure of a 250-residue protein. INTRODUCTIONAt present, over six million unique protein sequences have been deposited in the public databases, and this number is growing rapidly (http://www.ncbi.nlm.nih.gov/RefSeq/). Meanwhile, despite the progress of high-throughput structural genomics initiatives, just over 50,000 protein structures have so far been experimentally determined. This enormous disparity between the number of sequences and structures has driven research toward computational methods for predicting protein structure from sequence. Computational methods grounded in simulation of the folding process using only the sequence itself as input (the so-called ab initio or de novo approaches) have been pursued for decades and are showing some progress 1 . However, in general, these methods are either computationally intractable or show poor performance on everything except the smallest proteins (o100 amino acids) 1 .The most successful general approach for predicting the structure of proteins involves the detection of homologs of known three-dimensional (3D) structure-the so-called template-based homology modeling or fold-recognition. These methods rely on the observation that the number of folds in nature appears to be limited and that many different remotely homologous protein sequences adopt remarkably similar structures 2 . Thus, given a protein sequence of interest, one may compare this sequence with the sequences of proteins with experimentally determined structures. If a homolog can be found, an alignment of the two sequences can be generated and used directly to build a 3D model of the sequence of interest. The practical applications of protein structure prediction are many and varied, including guiding the development of functional hypotheses about hypothetical proteins 3 , improving phasing signals in crystallography 4 , selecting sites for mutagenesis 5 and the rational design of drugs 6 .Every 2 years an international blind trial of protein structure prediction techniques is held (Critical Assessmen...
3DLigandSite is a web server for the prediction of ligand-binding sites. It is based upon successful manual methods used in the eighth round of the Critical Assessment of techniques for protein Structure Prediction (CASP8). 3DLigandSite utilizes protein-structure prediction to provide structural models for proteins that have not been solved. Ligands bound to structures similar to the query are superimposed onto the model and used to predict the binding site. In benchmarking against the CASP8 targets 3DLigandSite obtains a Matthew’s correlation co-efficient (MCC) of 0.64, and coverage and accuracy of 71 and 60%, respectively, similar results to our manual performance in CASP8. In further benchmarking using a large set of protein structures, 3DLigandSite obtains an MCC of 0.68. The web server enables users to submit either a query sequence or structure. Predictions are visually displayed via an interactive Jmol applet. 3DLigandSite is available for use at http://www.sbg.bio.ic.ac.uk/3dligandsite.
Fourteen models were constructed and analyzed for the comparative modeling section of Critical Assessment of Techniques for Protein Structure Prediction (CASP4). Sequence identity between each target and the best possible parent(s) ranged between 55 and 13%, and the root-mean-square deviation between model and target was from 0.8 to 17.9 A. In the fold recognition section, 10 of the 11 remote homologues were recognized. The modeling protocols are a combination of automated computer algorithms, 3D-JIGSAW (for comparative modeling) and 3D-PSSM (for fold recognition), with human intervention at certain critical stages. In particular, intervention is required to check superfamily assignment, best possible parents from which to model, sequence alignments to those parents and take-off regions for modeling variable regions. There now is a convergence of algorithms for comparative modeling and fold recognition, particularly in the region of remote homology.
Structural and functional annotation of the large and growing database of genomic sequences is a major problem in modern biology. Protein structure prediction by detecting remote homology to known structures is a well-established and successful annotation technique. However, the broad spectrum of evolutionary change that accompanies the divergence of close homologues to become remote homologues cannot easily be captured with a single algorithm. Recent advances to tackle this problem have involved the use of multiple predictive algorithms available on the Internet. Here we demonstrate how such ensembles of predictors can be designed in-house under controlled conditions and permit significant improvements in recognition by using a concept taken from protein loop energetics and applying it to the general problem of 3D clustering. We have developed a stringent test that simulates the situation where a protein sequence of interest is submitted to multiple different algorithms and not one of these algorithms can make a confident (95%) correct assignment. A method of meta-server prediction (Phyre) that exploits the benefits of a controlled environment for the component methods was implemented. At 95% precision or higher, Phyre identified 64.0% of all correct homologous query-template relationships, and 84.0% of the individual test query proteins could be accurately annotated. In comparison to the improvement that the single best fold recognition algorithm (according to training) has over PSI-Blast, this represents a 29.6% increase in the number of correct homologous query-template relationships, and a 46.2% increase in the number of accurately annotated queries. It has been well recognised in fold prediction, other bioinformatics applications, and in many other areas, that ensemble predictions generally are superior in accuracy to any of the component individual methods. However there is a paucity of information as to why the ensemble methods are superior and indeed this has never been systematically addressed in fold recognition. Here we show that the source of ensemble power stems from noise reduction in filtering out false positive matches. The results indicate greater coverage of sequence space and improved model quality, which can consequently lead to a reduction in the experimental workload of structural genomics initiatives.
A big challenge in current systems biology research arises when different types of data must be accessed from separate sources and visualized using separate tools. The high cognitive load required to navigate such a workflow is detrimental to hypothesis generation. Accordingly, there is a need for a robust research platform that incorporates all data and provides integrated search, analysis, and visualization features through a single portal. Here, we present ePlant (http://bar.utoronto.ca/eplant), a visual analytic tool for exploring multiple levels of Arabidopsis thaliana data through a zoomable user interface. ePlant connects to several publicly available web services to download genome, proteome, interactome, transcriptome, and 3D molecular structure data for one or more genes or gene products of interest. Data are displayed with a set of visualization tools that are presented using a conceptual hierarchy from big to small, and many of the tools combine information from more than one data type. We describe the development of ePlant in this article and present several examples illustrating its integrative features for hypothesis generation. We also describe the process of deploying ePlant as an “app” on Araport. Building on readily available web services, the code for ePlant is freely available for any other biological species research
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.