Background Determining binding affinity in protein-protein interactions is important in the discovery and design of novel therapeutics and mutagenesis studies. Determination of binding affinity of proteins in the formation of protein complexes requires sophisticated, expensive and time-consuming experimentation which can be replaced with computational methods. Most computational prediction techniques require protein structures that limit their applicability to protein complexes with known structures. In this work, we explore sequence-based protein binding affinity prediction using machine learning. Method We have used protein sequence information instead of protein structures along with machine learning techniques to accurately predict the protein binding affinity. Results We present our findings that the true generalization performance of even the state-of-the-art sequence-only predictor is far from satisfactory and that the development of machine learning methods for binding affinity prediction with improved generalization performance is still an open problem. We have also proposed a sequence-based novel protein binding affinity predictor called ISLAND which gives better accuracy than existing methods over the same validation set as well as on external independent test dataset. A cloud-based webserver implementation of ISLAND and its python code are available at https://sites.google.com/view/wajidarshad/software. Conclusion This paper highlights the fact that the true generalization performance of even the state-of-the-art sequence-only predictor of binding affinity is far from satisfactory and that the development of effective and practical methods in this domain is still an open problem.
Quantifying the hemolytic activity of peptides is a crucial step in the discovery of novel therapeutic peptides. Computational methods are attractive in this domain due to their ability to guide wet-lab experimental discovery or screening of peptides based on their hemolytic activity. However, existing methods are unable to accurately model various important aspects of this predictive problem such as the role of N/C-terminal modifications, D- and L- amino acids, etc. In this work, we have developed a novel neural network-based approach called HemoNet for predicting the hemolytic activity of peptides. The proposed method captures the contextual importance of different amino acids in a given peptide sequence using a specialized feature embedding in conjunction with SMILES-based fingerprint representation of N/C-terminal modifications. We have analyzed the predictive performance of the proposed method using stratified cross-validation in comparison with previous methods, non-redundant cross-validation as well as validation on external peptides and clinical antimicrobial peptides. Our analysis shows the proposed approach achieves significantly better predictive performance (AUC-ROC of 88%) in comparison to previous approaches (HemoPI and HemoPred with AUC-ROC of 73%). HemoNet can be a useful tool in the search for novel therapeutic peptides. The python implementation of the proposed method is available at the URL: https://github.com/adibayaseen/HemoNet.
Motivation Machine-learning-based prediction of compound–protein interactions (CPIs) is important for drug design, screening and repurposing. Despite numerous recent publication with increasing methodological sophistication claiming consistent improvements in predictive accuracy, we have observed a number of fundamental issues in experiment design that produce overoptimistic estimates of model performance. Results We systematically analyze the impact of several factors affecting generalization performance of CPI predictors that are overlooked in existing work: (i) similarity between training and test examples in cross-validation; (ii) synthesizing negative examples in absence of experimentally verified negative examples and (iii) alignment of evaluation protocol and performance metrics with real-world use of CPI predictors in screening large compound libraries. Using both state-of-the-art approaches by other researchers as well as a simple kernel-based baseline, we have found that effective assessment of generalization performance of CPI predictors requires careful control over similarity between training and test examples. We show that, under stringent performance assessment protocols, a simple kernel-based approach can exceed the predictive performance of existing state-of-the-art methods. We also show that random pairing for generating synthetic negative examples for training and performance evaluation results in models with better generalization in comparison to more sophisticated strategies used in existing studies. Our analyses indicate that using proposed experiment design strategies can offer significant improvements for CPI prediction leading to effective target compound screening for drug repurposing and discovery of putative chemical ligands of SARS-CoV-2-Spike and Human-ACE2 proteins. Availability and implementation Code and supplementary material available at https://github.com/adibayaseen/HKRCPI. Supplementary information Supplementary data are available at Bioinformatics online.
Please refer to published version for the most recent bibliographic citation information. If a published version is known of, the repository item page linked to above, will contain details on accessing it.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.