Motivation
Identifying compound-protein interaction (CPI) is a crucial task in drug discovery and chemogenomics studies, and proteins without three-dimensional (3D) structure account for a large part of potential biological targets, which requires developing methods using only protein sequence information to predict CPI. However, sequence-based CPI models may face some specific pitfalls, including using inappropriate datasets, hidden ligand bias, and splitting datasets inappropriately, resulting in overestimation of their prediction performance.
Results
To address these issues, we here constructed new datasets specific for CPI prediction, proposed a novel transformer neural network named TransformerCPI, and introduced a more rigorous label reversal experiment to test whether a model learns true interaction features. TransformerCPI achieved much improved performance on the new experiments, and it can be deconvolved to highlight important interacting regions of protein sequences and compound atoms, which may contribute chemical biology studies with useful guidance for further ligand structural optimization.
Supplementary information
Supplementary data are available at Bioinformatics online.
Availability and implementation
https://github.com/lifanchen-simm/transformerCPI
Alterations of discoidin domain receptor1 (DDR1) may lead to increased production of inflammatory cytokines, making DDR1 an attractive target for inflammatory bowel disease (IBD) therapy. A scaffold-based molecular design workflow was established and performed by integrating a deep generative model, kinase selectivity screening and molecular docking, leading to a novel DDR1 inhibitor compound 2, which showed potent DDR1 inhibition profile (IC 50 = 10.6 ± 1.9 nM) and excellent selectivity against a panel of 430 kinases (S (10) = 0.002 at 0.1 μM). Compound 2 potently inhibited the expression of pro-inflammatory cytokines and DDR1 autophosphorylation in cells, and it also demonstrated promising oral therapeutic effect in a dextran sulfate sodium (DSS)-induced mouse colitis model.
Drug development based on target proteins has been a successful approach in recent decades. However, the conventional structure-based drug design (SBDD) pipeline is a complex, human-engineered process with multiple independently optimized steps. Here, we propose a sequence-to-drug concept for computational drug design based on protein sequence information by end-to-end differentiable learning. We validate this concept in three stages. First, we design TransformerCPI2.0 as a core tool for the concept, which demonstrates generalization ability across proteins and compounds. Second, we interpret the binding knowledge that TransformerCPI2.0 learned. Finally, we use TransformerCPI2.0 to discover new hits for challenging drug targets, and identify new target for an existing drug based on an inverse application of the concept. Overall, this proof-of-concept study shows that the sequence-to-drug concept adds a perspective on drug design. It can serve as an alternative method to SBDD, particularly for proteins that do not yet have high-quality 3D structures available.
Motivation
Breast cancer is one of the leading causes of cancer deaths among women worldwide. It is necessary to develop new breast cancer drugs because of the shortcomings of existing therapies. The traditional discovery process is time-consuming and expensive. Repositioning of clinically approved drugs has emerged as a novel approach for breast cancer therapy. However, serendipitous or experiential repurposing cannot be used as a routine method.
Results
In this study, we proposed a graph neural network model GraphRepur based on GraphSAGE for drug repurposing against breast cancer. GraphRepur integrated two major classes of computational methods, drug network-based and drug signature-based. The differentially expressed genes of disease, drug-exposure gene expression data, and the drug-drug links information were collected. By extracting the drug signatures and topological structure information contained in the drug relationships, GraphRepur can predict new drugs for breast cancer, outperforming previous state-of-the-art approaches and some classic machine learning methods. The high-ranked drugs have indeed been reported as new uses for breast cancer treatment recently.
Availability
The source code of our model and datasets are available at: https://github.com/cckamy/GraphRepur and https://figshare.com/articles/software/GraphRepur_Breast_Cancer_Drug_Repurposing/14220050
Supplementary information
Supplementary data are available at Bioinformatics online.
Reliable uncertainty quantification for statistical models is crucial in various downstream applications, especially for drug design and discovery where mistakes may incur a large amount of cost. This topic has therefore absorbed much attention and a plethora of methods have been proposed over the past years. The approaches that have been reported so far can be mainly categorized into two classes: distance-based approaches and Bayesian approaches. Although these methods have been widely used in many scenarios and shown promising performance with their distinct superiorities, being overconfident on out-of-distribution examples still poses challenges for the deployment of these techniques in real-world applications. In this study we investigated a number of consensus strategies in order to combine both distance-based and Bayesian approaches together with post-hoc calibration for improved uncertainty quantification in QSAR (Quantitative Structure–Activity Relationship) regression modeling. We employed a set of criteria to quantitatively assess the ranking and calibration ability of these models. Experiments based on 24 bioactivity datasets were designed to make critical comparison between the model we proposed and other well-studied baseline models. Our findings indicate that the hybrid framework proposed by us can robustly enhance the model ability of ranking absolute errors. Together with post-hoc calibration on the validation set, we show that well-calibrated uncertainty quantification results can be obtained in domain shift settings. The complementarity between different methods is also conceptually analyzed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.