With the exponential growth in the determination of protein sequences and structures via genome sequencing and structural genomics efforts, there is a growing need for reliable computational methods to determine the biochemical function of these proteins. This paper reviews the efforts to address the challenge of annotating the function at the molecular level of uncharacterized proteins. While sequence- and three-dimensional-structure-based methods for protein function prediction have been reviewed previously, the recent trends in local structure-based methods have received less attention. These local structure-based methods are the primary focus of this review. Computational methods have been developed to predict the residues important for catalysis and the local spatial arrangements of these residues can be used to identify protein function. In addition, the combination of different types of methods can help obtain more information and better predictions of function for proteins of unknown function. Global initiatives, including the Enzyme Function Initiative (EFI), COMputational BRidges to EXperiments (COMBREX), and the Critical Assessment of Function Annotation (CAFA), are evaluating and testing the different approaches to predicting the function of proteins of unknown function. These initiatives and global collaborations will increase the capability and reliability of methods to predict biochemical function computationally and will add substantial value to the current volume of structural genomics data by reducing the number of absent or inaccurate functional annotations.
A scoring method for the prediction of catalytically important residues in enzyme structures is presented and used to examine the participation of distal residues in enzyme catalysis. Scores are based on the Partial Order Optimum Likelihood (POOL) machine learning method, using computed electrostatic properties, surface geometric features, and information obtained from the phylogenetic tree as input features. Predictions of distal residue participation in catalysis are compared with experimental kinetics data from the literature on variants of the featured enzymes; some additional kinetics measurements are reported for variants of Pseudomonas putida nitrile hydratase (ppNH) and for Escherichia coli alkaline phosphatase (AP). The multilayer active sites of P. putida nitrile hydratase and of human phosphoglucose isomerase are predicted by the POOL log ZP scores, as is the single-layer active site of P. putida ketosteroid isomerase. The log ZP score cutoff utilized here results in over-prediction of distal residue involvement in E. coli alkaline phosphatase. While fewer experimental data points are available for P. putida mandelate racemase and for human carbonic anhydrase II, the POOL log ZP scores properly predict the previously reported participation of distal residues.
Edited by Norma M. AllewellCaspases are cysteine-aspartic proteases involved in the regulation of programmed cell death (apoptosis) and a number of other biological processes. Despite overall similarities in structure and active-site composition, caspases show striking selectivity for particular protein substrates. Exosites are emerging as one of the mechanisms by which caspases can recruit, engage, and orient these substrates for proper hydrolysis. Following computational analyses and database searches for candidate exosites, we utilized site-directed mutagenesis to identify a new exosite in caspase-6 at the hinge between the disordered N-terminal domain (NTD), residues 23-45, and core of the caspase-6 structure. We observed that substitutions of the tri-arginine patch Arg-42-Arg-44 or the R44K cancer-associated mutation in caspase-6 markedly alter its rates of protein substrate hydrolysis. Notably, turnover of protein substrates but not of short peptide substrates was affected by these exosite alterations, underscoring the importance of this region for protein substrate recruitment. Hydrogen-deuterium exchange MS-mediated interrogation of the intrinsic dynamics of these enzymes suggested the presence of a substrate-binding platform encompassed by the NTD and the 240's region (containing residues 236 -246), which serves as a general exosite for caspase-6specific substrate recruitment. In summary, we have identified an exosite on caspase-6 that is critical for protein substrate recognition and turnover and therefore highly relevant for diseases such as cancer in which caspase-6 -mediated apoptosis is often disrupted, and in neurodegeneration in which caspase-6 plays a central role.
Thousands of protein structures of unknown or uncertain function have been reported as a result of high-throughput structure determination techniques developed by Structural Genomics (SG) projects. However, many of the putative functional assignments of these SG proteins in the Protein Data Bank (PDB) are incorrect. While high-throughput biochemical screening techniques have provided valuable functional information for limited sets of SG proteins, the biochemical functions for most SG proteins are still unknown or uncertain. Therefore, computational methods for the reliable prediction of protein function from structure can add tremendous value to the existing SG data. In this article, we show how computational methods may be used to predict the function of SG proteins, using examples from the six-hairpin glycosidase (6-HG) and the concanavalin A-like lectin/glucanase (CAL/G) superfamilies. Using a set of predicted functional residues, obtained from computed electrostatic and chemical properties for each protein structure, it is shown that these superfamilies may be sorted into functional families according to biochemical function. Within these superfamilies, a total of 18 SG proteins were analyzed according to their predicted, local functional sites: 13 from the 6-HG superfamily, five from the CAL/G superfamily. Within the 6-HG superfamily, an uncharacterized protein BACOVA_03626 from Bacteroides ovatus (PDB 3ON6) and a hypothetical protein BT3781 from Bacteroides thetaiotaomicron (PDB 2P0V) are shown to have very strong active site matches with exo-α-1,6-mannosidases, thus likely possessing this function. Also in this superfamily, it is shown that protein BH0842, a putative glycoside hydrolase from Bacillus halodurans (PDB 2RDY), has a predicted active site that matches well with a known α-L-galactosidase. In the CAL/G superfamily, an uncharacterized glycosyl hydrolase family 16 protein from Mycobacterium smegmatis (PDB 3RQ0) is shown to have local structural similarity at the predicted active site with the known members of the GH16 family, with the closest match to the endoglucanase subfamily. The method discussed herein can predict whether an SG protein is correctly or incorrectly annotated and can sometimes provide a reliable functional annotation. Examples of application of the method across folds, comparing active sites between two proteins of different structural folds, are also given.
As a result of high‐throughput protein structure initiatives, over 14,400 protein structures have been solved by Structural Genomics (SG) centers and participating research groups. While the totality of SG data represents a tremendous contribution to genomics and structural biology, reliable functional information for these proteins is generally lacking. Better functional predictions for SG proteins will add substantial value to the structural information already obtained. Our method described herein, Graph Representation of Active Sites for Prediction of Function (GRASP‐Func), predicts quickly and accurately the biochemical function of proteins by representing residues at the predicted local active site as graphs rather than in Cartesian coordinates. We compare the GRASP‐Func method to our previously reported method, Structurally Aligned Local Sites of Activity (SALSA), using the Ribulose Phosphate Binding Barrel (RPBB), 6‐Hairpin Glycosidase (6‐HG), and Concanavalin A‐like Lectins/Glucanase (CAL/G) superfamilies as test cases. In each of the superfamilies, SALSA and the much faster method GRASP‐Func yield similar correct classification of previously characterized proteins, providing a validated benchmark for the new method. In addition, we analyzed SG proteins using our SALSA and GRASP‐Func methods to predict function. Forty‐one SG proteins in the RPBB superfamily, nine SG proteins in the 6‐HG superfamily, and one SG protein in the CAL/G superfamily were successfully classified into one of the functional families in their respective superfamily by both methods. This improved, faster, validated computational method can yield more reliable predictions of function that can be used for a wide variety of applications by the community.
Members of the Crotonase superfamily, a mechanistically diverse family of proteins that share a conserved quaternary structure, can often catalyze more than one reaction. However, the spectrum of activity for its members has not been well studied. We report on measured crotonase and hydrolase activity for eight structural genomics (SG) proteins from the Crotonase superfamily plus two previously characterized proteins, intended as controls: human enoyl CoA hydratase (ECH) and Anabaena β-diketone hydrolase. Like most of the 15,000+ SG protein structures deposited in the Protein Data Bank (PDB), the eight SG proteins are of unknown or uncertain biochemical function. The functional characterization of the eight SG proteins is guided by the Structurally Aligned Local Sites of Activity (SALSA), a localstructure-based computational approach to functional annotation. For human ECH, the turnover number for hydrolase activity is threefold higher than that for ECH activity, although the catalytic efficiency is 160-fold higher for ECH. Three SG proteins originally annotated as ECHs were predicted by SALSA to be hydrolases and are observed to have higher catalytic efficiencies for hydrolase activity than for ECH activity, on par with the previously characterized hydrolase. Among the five SG proteins predicted by SALSA to be ECHs, all but one also show some hydrolase activity; all five exhibit lower ECH activity than the human ECH with respect to the crotonyl-CoA substrate. Here, we show examples demonstrating that SALSA can correct functional misannotations even within enzyme families that display promiscuous activity.
DNA is constantly subjected to damage from endogenous and exogenous sources. Replicative DNA polymerases are typically unable to replicate damaged DNA, but specialized DNA polymerases in the Y family possess this ability. Escherichia coli has two Y family polymerases that are specialized to bypass damage when copying DNA in a process called translesion synthesis (TLS). DinB is one of these polymerases and is involved in bypassing deoxyguanosine adducts at the N2 position. Humans have four Y family polymerases, including DNA polymerase kappa. E. coli DinB and human pol kappa both bypass minor groove adducts and are inhibited by major groove adducts. However, pol kappa is more efficient in copying past DNA damage in the extension step of translesion synthesis. In order to probe the importance of particular residues in the extension step of TLS, the computational tool POOL was utilized. This method identified active site residues and residues previously observed to be important for activity. POOL also predicted more distant residues that do not have direct contact with substrates that may have catalytic importance, but the residues are in different regions of DinB and pol kappa. To study the contribution of these distal residues on the extension step of TLS, DinB and pol kappa variants with mutations at the predicted distal positions were constructed and are being assayed for bypass of damage. We have identified variants with a range of activity on undamaged and damaged DNA; in particular several mutations in the DinB little finger domain severely reduce activity.Support or Funding InformationSupport from NSF‐MCB‐1517290, American Cancer Society RSG‐12‐161‐01‐DMC, and the PhRMA Foundation (predoctoral fellowship in informatics awarded to CLM)This abstract is from the Experimental Biology 2018 Meeting. There is no full text article associated with this abstract published in The FASEB Journal.
There are currently over 14,300 Structural Genomics (SG) protein structures deposited in the PDB by protein structure initiatives. However, most of these SG proteins have unknown or putative function annotations. This accumulated structural information represents a tremendous contribution to structural biology and genomics. Still, the addition of accurate functional annotations for these SG proteins would add substantial value to this information. Our approach to functional annotation and validation incorporates predicting functional assignments through structure‐based computed chemical properties and local structure matching followed by biochemical validation. This research focuses on four superfamilies: Crotonase, Ribulose Phosphate Binding Barrel, 6‐Hairpin Glycosidase, and Concanavlin A‐like Lectins and Glucanases. First, Partial Order Optimum Likelihood (POOL) is used to predict computationally the catalytically important residues in each protein structure. Next, Structurally Aligned Local Sites of Activity (SALSA) develops spatially‐localized consensus signatures for the proteins of known function in each functional family within each superfamily based on POOL‐predicted residues and functionally characterized residues of importance. Then, the POOL‐predicted residues for each SG protein are compared to each consensus signature and scored to determine their degree of similarity at the local active site. Finally, we introduce a new, computationally faster method for sorting protein superfamilies and annotating protein function using local structure matching in graph representation: Graph Representation of Active Sites for Prediction of Function (GRASP‐Func). Sets of tetrahedra are generated through Delaunay triangulation for each protein structure using the alpha carbon atoms of each residue. Then, sets of proteins with matched tetrahedra are grouped together and images are generated showing the relationship of each protein (node) and its neighbors (edges) with similar active sites. We compare SALSA and GRASP‐Func and show that both methods correctly sort the superfamilies into their respective functional families. Both methods also make similar functional predictions for the SG proteins, with GRASP‐Func performing in far less time. Thus GRASP‐Func enables large‐scale comparisons and functional assignments within and across superfamilies. Finally, we are able to test these predictions biochemically to confirm function. Biochemical data for the Crotonase Superfamily show that while proteins have some promiscuous functionality, our methods predict the correct dominant function for each protein tested. The goal of this project is to provide a validated approach to functional annotation to enable applications from drug target identification to green chemistry and biofuel production.Support or Funding InformationSupport from NSF‐CHE‐1305655, NSF‐MCB‐1158176, NSF‐MCB‐1517290, PhRMA Foundations (Predoctoral Fellowship in Informatics awarded to CLM), NSF‐GRFP (JSL), MathWorks, Inc., and American Cancer Society Research Scholar Grant RSG‐12‐161‐01‐DMC (PJB).This abstract is from the Experimental Biology 2018 Meeting. There is no full text article associated with this abstract published in The FASEB Journal.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.