BackgroundStudy of drug-target interaction networks is an important topic for drug development. It is both time-consuming and costly to determine compound-protein interactions or potential drug-target interactions by experiments alone. As a complement, the in silico prediction methods can provide us with very useful information in a timely manner.Methods/Principal FindingsTo realize this, drug compounds are encoded with functional groups and proteins encoded by biological features including biochemical and physicochemical properties. The optimal feature selection procedures are adopted by means of the mRMR (Maximum Relevance Minimum Redundancy) method. Instead of classifying the proteins as a whole family, target proteins are divided into four groups: enzymes, ion channels, G-protein- coupled receptors and nuclear receptors. Thus, four independent predictors are established using the Nearest Neighbor algorithm as their operation engine, with each to predict the interactions between drugs and one of the four protein groups. As a result, the overall success rates by the jackknife cross-validation tests achieved with the four predictors are 85.48%, 80.78%, 78.49%, and 85.66%, respectively.Conclusion/SignificanceOur results indicate that the network prediction system thus established is quite promising and encouraging.
Antimicrobial peptides (AMPs) represent a class of natural peptides that form a part of the innate immune system, and this kind of ‘nature's antibiotics’ is quite promising for solving the problem of increasing antibiotic resistance. In view of this, it is highly desired to develop an effective computational method for accurately predicting novel AMPs because it can provide us with more candidates and useful insights for drug design. In this study, a new method for predicting AMPs was implemented by integrating the sequence alignment method and the feature selection method. It was observed that, the overall jackknife success rate by the new predictor on a newly constructed benchmark dataset was over 80.23%, and the Mathews correlation coefficient is 0.73, indicating a good prediction. Moreover, it is indicated by an in-depth feature analysis that the results are quite consistent with the previously known knowledge that some amino acids are preferential in AMPs and that these amino acids do play an important role for the antimicrobial activity. For the convenience of most experimental scientists who want to use the prediction method without the interest to follow the mathematical details, a user-friendly web-server is provided at http://amp.biosino.org/.
The metabolic stability is a very important idiosyncracy of proteins that is related to their global flexibility, intramolecular fluctuations, various internal dynamic processes, as well as many marvelous biological functions. Determination of protein's metabolic stability would provide us with useful information for in-depth understanding of the dynamic action mechanisms of proteins. Although several experimental methods have been developed to measure protein's metabolic stability, they are time-consuming and more expensive. Reported in this paper is a computational method, which is featured by (1) integrating various properties of proteins, such as biochemical and physicochemical properties, subcellular locations, network properties and protein complex property, (2) using the mRMR (Maximum Relevance & Minimum Redundancy) principle and the IFS (Incremental Feature Selection) procedure to optimize the prediction engine, and (3) being able to identify proteins among the four types: “short”, “medium”, “long”, and “extra-long” half-life spans. It was revealed through our analysis that the following seven characters played major roles in determining the stability of proteins: (1) KEGG enrichment scores of the protein and its neighbors in network, (2) subcellular locations, (3) polarity, (4) amino acids composition, (5) hydrophobicity, (6) secondary structure propensity, and (7) the number of protein complexes the protein involved. It was observed that there was an intriguing correlation between the predicted metabolic stability of some proteins and the real half-life of the drugs designed to target them. These findings might provide useful insights for designing protein-stability-relevant drugs. The computational method can also be used as a large-scale tool for annotating the metabolic stability for the avalanche of protein sequences generated in the post-genomic age.
Ubiquitination, one of the most important post-translational modifications of proteins, occurs when ubiquitin (a small 76-amino acid protein) is attached to lysine on a target protein. It often commits the labeled protein to degradation and plays important roles in regulating many cellular processes implicated in a variety of diseases. Since ubiquitination is rapid and reversible, it is time-consuming and labor-intensive to identify ubiquitination sites using conventional experimental approaches. To efficiently discover lysine-ubiquitination sites, a sequence-based predictor of ubiquitination site was developed based on nearest neighbor algorithm. We used the maximum relevance and minimum redundancy principle to identify the key features and the incremental feature selection procedure to optimize the prediction engine. PSSM conservation scores, amino acid factors and disorder scores of the surrounding sequence formed the optimized 456 features. The Mathew's correlation coefficient (MCC) of our ubiquitination site predictor achieved 0.142 by jackknife cross-validation test on a large benchmark dataset. In independent test, the MCC of our method was 0.139, higher than the existing ubiquitination site predictor UbiPred and UbPred. The MCCs of UbiPred and UbPred on the same test set were 0.135 and 0.117, respectively. Our analysis shows that the conservation of amino acids at and around lysine plays an important role in ubiquitination site prediction. What's more, disorder and ubiquitination have a strong relevance. These findings might provide useful insights for studying the mechanisms of ubiquitination and modulating the ubiquitination pathway, potentially leading to potential therapeutic strategies in the future.
BackgroundWith the huge amount of uncharacterized protein sequences generated in the post-genomic age, it is highly desirable to develop effective computational methods for quickly and accurately predicting their functions. The information thus obtained would be very useful for both basic research and drug development in a timely manner.Methodology/Principal FindingsAlthough many efforts have been made in this regard, most of them were based on either sequence similarity or protein-protein interaction (PPI) information. However, the former often fails to work if a query protein has no or very little sequence similarity to any function-known proteins, while the latter had similar problem if the relevant PPI information is not available. In view of this, a new approach is proposed by hybridizing the PPI information and the biochemical/physicochemical features of protein sequences. The overall first-order success rates by the new predictor for the functions of mouse proteins on training set and test set were 69.1% and 70.2%, respectively, and the success rate covered by the results of the top-4 order from a total of 24 orders was 65.2%.Conclusions/SignificanceThe results indicate that the new approach is quite promising that may open a new avenue or direction for addressing the difficult and complicated problem.
Given a compound, how can we effectively predict its biological function? It is a fundamentally important problem because the information thus obtained may benefit the understanding of many basic biological processes and provide useful clues for drug design. In this study, based on the information of chemical-chemical interactions, a novel method was developed that can be used to identify which of the following eleven metabolic pathway classes a query compound may be involved with: (1) Carbohydrate Metabolism, (2) Energy Metabolism, (3) Lipid Metabolism, (4) Nucleotide Metabolism, (5) Amino Acid Metabolism, (6) Metabolism of Other Amino Acids, (7) Glycan Biosynthesis and Metabolism, (8) Metabolism of Cofactors and Vitamins, (9) Metabolism of Terpenoids and Polyketides, (10) Biosynthesis of Other Secondary Metabolites, (11) Xenobiotics Biodegradation and Metabolism. It was observed that the overall success rate obtained by the method via the 5-fold cross-validation test on a benchmark dataset consisting of 3,137 compounds was 77.97%, which is much higher than 10.45%, the corresponding success rate obtained by the random guesses. Besides, to deal with the situation that some compounds may be involved with more than one metabolic pathway class, the method presented here is featured by the capacity able to provide a series of potential metabolic pathway classes ranked according to the descending order of their likelihood for each of the query compounds concerned. Furthermore, our method was also applied to predict 5,549 compounds whose metabolic pathway classes are unknown. Interestingly, the results thus obtained are quite consistent with the deductions from the reports by other investigators. It is anticipated that, with the continuous increase of the chemical-chemical interaction data, the current method will be further enhanced in its power and accuracy, so as to become a useful complementary vehicle in annotating uncharacterized compounds for their biological functions.
More and more people are concerned by the risk of unexpected side effects observed in the later steps of the development of new drugs, either in late clinical development or after marketing approval. In order to reduce the risk of the side effects, it is important to look out for the possible xenobiotic responses at an early stage. We attempt such an effort through a prediction by assuming that similarities in microarray profiles indicate shared mechanisms of action and/or toxicological responses among the chemicals being compared. A large time course microarray database derived from livers of compound-treated rats with thirty-four distinct pharmacological and toxicological responses were studied. The mRMR (Minimum-Redundancy-Maximum-Relevance) method and IFS (Incremental Feature Selection) were used to select a compact feature set (141 features) for the reduction of feature dimension and improvement of prediction performance. With these 141 features, the Leave-one-out cross-validation prediction accuracy of first order response using NNA (Nearest Neighbor Algorithm) was 63.9%. Our method can be used for pharmacological and xenobiotic responses prediction of new compounds and accelerate drug development.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.