Abstract:GASS results were compared with those catalogued in the catalytic site atlas (CSA) in four different datasets and compared with two other methods: amino acid pattern search for substructures and motif and catalytic site identification. The results show GASS can correctly identify >90% of the templates searched. Experiments were also run using data from the substrate binding sites prediction competition CASP 10, and GASS is ranked fourth among the 18 methods considered.
“…The tests reported in this section consider datasets of catalytic sites, although MeGASS can be also used for subtract binding site identification [10]. We start with catalytic sites because they are smaller and easier to deal with.…”
Section: Resultsmentioning
confidence: 99%
“…In order to tackled these problems, we recently proposed GASS (Genetic Active Site Search) [10], which does not impose any restrictions such as those aforementioned and, above all, can precisely identify the chain where the residues of the active site are located. Difficulties in correctly identifying the chain where the active site residues are located is one of the main drawbacks of the current methods, as showed in [10].…”
Section: Introductionmentioning
confidence: 99%
“…However, indirect comparisons with small sets of proteins used by them has already shown they are at least as good as and most time better than ASSAM and CatSid[10].…”
Active sites are regions in the enzyme surface designed to interact with other molecules. Given their importance to enzyme function, active site amino acids are more conserved during evolution than the whole sequence, and can be a useful source of information for function prediction. For this reason, great effort has been put into identifying active sites in proteins. The majority of methods for this purpose uses an active site template of a protein of known function to search for similar structures into proteins of unknown function. In this direction, we recently proposed GASS (Genetic Active Site Search), a method based on an evolutionary algorithm to search for active sites in proteins. Although the method obtained very accurate results, its main strength and weakness are related to using only the spatial distance from the template to the protein to evaluate candidate sites. In this direction, this paper proposes MeGASS, a multiobjective version of GASS that also considers the depth of the residues when looking for active sites. This is important, as active sites are known for being closer to the protein surface to allow interactions with ligands. Results showed the depth attribute improves over the results of GASS, and its role into the method is worth further investigation.
“…The tests reported in this section consider datasets of catalytic sites, although MeGASS can be also used for subtract binding site identification [10]. We start with catalytic sites because they are smaller and easier to deal with.…”
Section: Resultsmentioning
confidence: 99%
“…In order to tackled these problems, we recently proposed GASS (Genetic Active Site Search) [10], which does not impose any restrictions such as those aforementioned and, above all, can precisely identify the chain where the residues of the active site are located. Difficulties in correctly identifying the chain where the active site residues are located is one of the main drawbacks of the current methods, as showed in [10].…”
Section: Introductionmentioning
confidence: 99%
“…However, indirect comparisons with small sets of proteins used by them has already shown they are at least as good as and most time better than ASSAM and CatSid[10].…”
Active sites are regions in the enzyme surface designed to interact with other molecules. Given their importance to enzyme function, active site amino acids are more conserved during evolution than the whole sequence, and can be a useful source of information for function prediction. For this reason, great effort has been put into identifying active sites in proteins. The majority of methods for this purpose uses an active site template of a protein of known function to search for similar structures into proteins of unknown function. In this direction, we recently proposed GASS (Genetic Active Site Search), a method based on an evolutionary algorithm to search for active sites in proteins. Although the method obtained very accurate results, its main strength and weakness are related to using only the spatial distance from the template to the protein to evaluate candidate sites. In this direction, this paper proposes MeGASS, a multiobjective version of GASS that also considers the depth of the residues when looking for active sites. This is important, as active sites are known for being closer to the protein surface to allow interactions with ligands. Results showed the depth attribute improves over the results of GASS, and its role into the method is worth further investigation.
“…Another method that predicts active site pockets is AADS that uses geometric information on cavities in addition to physicochemical properties of residues . Some methods have implemented genetic algorithms, which use structural information as well as sequence and network based properties in combination with machine learning to identify active site residues . More recently, protein dynamics was also used as a predictor for active sites.…”
Section: Introductionmentioning
confidence: 99%
“…18 Some methods have implemented genetic algorithms, which use structural information as well as sequence and network based properties in combination with machine learning to identify active site residues. 19,20 More recently, protein dynamics was also used as a predictor for active sites. Glantz-Gashai and co-workers revealed that normal modes can expose active sites, and they used changes in solvent accessibilities to predict active site residues.…”
Binding sites in proteins can be either specifically functional binding sites (active sites) that bind specific substrates with high affinity or regulatory binding sites (allosteric sites), that modulate the activity of functional binding sites through effector molecules. Owing to their significance in determining protein function, the identification of protein functional and regulatory binding sites is widely acknowledged as an important biological problem. In this work, we present a novel binding site prediction method, Active and Regulatory site Prediction (AR‐Pred), which supplements protein geometry, evolutionary, and physicochemical features with information about protein dynamics to predict putative active and allosteric site residues. As the intrinsic dynamics of globular proteins plays an essential role in controlling binding events, we find it to be an important feature for the identification of protein binding sites. We train and validate our predictive models on multiple balanced training and validation sets with random forest machine learning and obtain an ensemble of discrete models for each prediction type. Our models for active site prediction yield a median area under the curve (AUC) of 91% and Matthews correlation coefficient (MCC) of 0.68, whereas the less well‐defined allosteric sites are predicted at a lower level with a median AUC of 80% and MCC of 0.48. When tested on an independent set of proteins, our models for active site prediction show comparable performance to two existing methods and gains compared to two others, while the allosteric site models show gains when tested against three existing prediction methods. AR‐Pred is available as a free downloadable package at https://github.com/sambitmishra0628/AR-PRED_source.
Protein-ligand binding site prediction methods aim to predict, from amino acid sequence, protein-ligand interactions, putative ligands, and ligand binding site residues using either sequence information, structural information, or a combination of both. In silico characterization of protein-ligand interactions has become extremely important to help determine a protein's functionality, as in vivo-based functional elucidation is unable to keep pace with the current growth of sequence databases. Additionally, in vitro biochemical functional elucidation is time-consuming, costly, and may not be feasible for large-scale analysis, such as drug discovery. Thus, in silico prediction of protein-ligand interactions must be utilized to aid in functional elucidation. Here, we briefly discuss protein function prediction, prediction of protein-ligand interactions, the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated EvaluatiOn (CAMEO) competitions, along with their role in shaping the field. We also discuss, in detail, our cutting-edge web-server method, FunFOLD for the structurally informed prediction of protein-ligand interactions. Furthermore, we provide a step-by-step guide on using the FunFOLD web server and FunFOLD3 downloadable application, along with some real world examples, where the FunFOLD methods have been used to aid functional elucidation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.