phyre2 is a suite of tools available on the web to predict and analyze protein structure, function and mutations. the focus of phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools. phyre2 replaces phyre, the original version of the server for which we previously published a paper in Nature Protocols. In this updated protocol, we describe phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants (e.g., nonsynonymous snps (nssnps)) for a user's protein sequence. users are guided through results by a simple interface at a level of detail they determine. this protocol will guide users from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. a range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins that are difficult to model. the server is available at http://www.sbg. bio.ic.ac.uk/phyre2. a typical structure prediction will be returned between 30 min and 2 h after submission.
Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based Critical Assessment of protein Function Annotation (CAFA) experiment. Fifty-four methods representing the state-of-the-art for protein function prediction were evaluated on a target set of 866 proteins from eleven organisms. Two findings stand out: (i) today’s best protein function prediction algorithms significantly outperformed widely-used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is significant need for improvement of currently available tools.
BackgroundA major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging.ResultsWe conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2.ConclusionsThe top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-016-1037-6) contains supplementary material, which is available to authorized users.
3DLigandSite is a web server for the prediction of ligand-binding sites. It is based upon successful manual methods used in the eighth round of the Critical Assessment of techniques for protein Structure Prediction (CASP8). 3DLigandSite utilizes protein-structure prediction to provide structural models for proteins that have not been solved. Ligands bound to structures similar to the query are superimposed onto the model and used to predict the binding site. In benchmarking against the CASP8 targets 3DLigandSite obtains a Matthew’s correlation co-efficient (MCC) of 0.64, and coverage and accuracy of 71 and 60%, respectively, similar results to our manual performance in CASP8. In further benchmarking using a large set of protein structures, 3DLigandSite obtains an MCC of 0.68. The web server enables users to submit either a query sequence or structure. Predictions are visually displayed via an interactive Jmol applet. 3DLigandSite is available for use at http://www.sbg.bio.ic.ac.uk/3dligandsite.
To identify genetic factors influencing cardiac conduction, we carried out a genome-wide association study of electrocardiographic time intervals in 6,543 Indian Asians. We identified association of a nonsynonymous SNP, rs6795970, in SCN10A (P = 2.8 x 10(-15)) with PR interval, a marker of cardiac atrioventricular conduction. Replication testing among 6,243 Indian Asians and 5,370 Europeans confirmed that rs6795970 (G>A) is associated with prolonged cardiac conduction (longer P-wave duration, PR interval and QRS duration, P = 10(-5) to 10(-20)). SCN10A encodes Na(V)1.8, a sodium channel. We show that SCN10A is expressed in mouse and human heart tissue and that PR interval is shorter in Scn10a(-/-) mice than in wild-type mice. We also find that rs6795970 is associated with a higher risk of heart block (P < 0.05) and a lower risk of ventricular fibrillation (P = 0.01). Our findings provide new insight into the pathogenesis of cardiac conduction, heart block and ventricular fibrillation.
We carried out a genome-wide association study of hemoglobin levels in 16,001 individuals of European and Indian Asian ancestry. The most closely associated SNP (rs855791) results in nonsynonymous (V736A) change in the serine protease domain of TMPRSS6 and a blood hemoglobin concentration 0.13 (95% CI 0.09–0.17) g/dl lower per copy of allele A (P = 1.6 × 10−13). Our findings suggest that TMPRSS6, a regulator of hepcidin synthesis and iron handling, is crucial in hemoglobin level maintenance.
Chronic kidney disease (CKD), the result of permanent loss of kidney function, is a major global problem. We identify common genetic variants at chr2p12-p13, chr6q26, chr17q23 and chr19q13 associated with serum creatinine, a marker of kidney function (P=10−10 to 10−15). SNPs rs10206899 (near NAT8, chr2p12-p13) and rs4805834 (near SLC7A9, chr19q13) were also associated with CKD. Our findings provide new insight into metabolic, solute and drug-transport pathways underlying susceptibility to CKD.
Many nonsynonymous single nucleotide polymorphisms (nsSNPs) are disease causing due to effects at protein-protein interfaces. We have integrated a database of the three-dimensional (3D) structures of human protein/protein complexes and the humsavar database of nsSNPs. We analyzed the location of nsSNPS in terms of their location in the protein core, at protein-protein interfaces, and on the surface when not at an interface. Disease-causing nsSNPs that do not occur in the protein core are preferentially located at protein-protein interfaces rather than surface noninterface regions when compared to random segregation. The disruption of the protein-protein interaction can be explained by a range of structural effects including the loss of an electrostatic salt bridge, the destabilization due to reduction of the hydrophobic effect, the formation of a steric clash, and the introduction of a proline altering the main-chain conformation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.