Discovering Highly Potent Molecules from an Initial Set of Inactives Using Iterative Screening

Cortés-Ciriano, Isidro; Firth, Nicholas C.; Bender, Andreas; Watson, Oliver P.

doi:10.1021/acs.jcim.8b00376

Cited by 30 publications

(44 citation statements)

References 80 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We are also happy to clarify that the errors bars in Figs. 3 and 4, and the ± values indicated in the text or in the tables all correspond to the standard deviation over the relevant population, consistent with standard practice [5, 6].…”

Section: In-depth Commentssupporting

confidence: 61%

Reply to “Missed opportunities in large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery”

et al. 2019

View full text Add to dashboard Cite

In response to Krstajic’s letter to the editor concerning our published paper, we here take the opportunity to reply, to re-iterate that no errors in our work were identified, to provide further details, and to re-emphasise the outputs of our study. Moreover, we highlight that all of the data are freely available for the wider scientific community (including the aforementioned correspondent) to undertake follow-on studies and comparisons.

show abstract

Section: In-depth Commentssupporting

confidence: 61%

Reply to “Missed opportunities in large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery”

et al. 2019

View full text Add to dashboard Cite

show abstract

“…We firstly compared the performance on the test set of DNN, using dropout probabilities in all layers of either 0.1, 0.25 or 0.5, and RF models ( Figure 2 in line with models reported in the literature for similar data sets 38 . Hence, the models obtained here are likely approaching the upper performance limit which can be obtained for the datasets used, which is also a likely factor behind the very similar performance obtained across methods.…”

Section: Resultsmentioning

confidence: 99%

Reliable Prediction Errors for Deep Neural Networks Using Test-Time Dropout

Cortés-Ciriano

Bender

2019

J. Chem. Inf. Model.

Self Cite

View full text Add to dashboard Cite

While the use of deep learning in drug discovery is gaining increasing attention, the lack of methods to computate reliable errors in prediction for Neural Networks prevents their application to guide decision making in domains where identifying unreliable predictions is essential, e.g. precision medicine. Here, we present a framework to compute reliable errors in prediction for Neural Networks using Test-Time Dropout and Conformal Prediction. Specifically, the algorithm consists of training a single Neural Network using dropout, and then applying it N times to both the validation and test sets, also employing dropout in this step. Therefore, for each instance in the validation and test sets an ensemble of predictions were generated. The residuals and absolute errors in prediction for the validation set were then used to compute prediction errors for test set instances using Conformal Prediction. We show using 24 bioactivity data sets from ChEMBL 23 that dropout Conformal Predictors are valid (i.e., the fraction of instances whose true value lies within the predicted interval strongly correlates with the confidence level) and efficient, as the predicted confidence intervals span a narrower set of values than those computed with Conformal Predictors generated using Random Forest (RF) models. Lastly, we show in retrospective virtual screening experiments that dropout and RF-based Conformal Predictors lead to comparable retrieval rates of active compounds. Overall, we propose a computationally efficient framework (as only N extra forward passes are required in addition to training a single network) to harness Test-Time Dropout and the Conformal Prediction framework, and to thereby generate reliable prediction errors for deep Neural Networks. Machine Learning -Data SplittingThe data sets were randomly split into a training set (70% of the data), a validation set (15%), and a test set (15%). For each data set, the training set was used to train a given network, whereas the validation set served to monitor the performance of the network during the training phase. In case of RF models, both the training and validation sets were used for model training.The predictive power of the final RF and DNN model was evaluated on the test set. The above split (and associated model training and testing) was repeated 20 times with random data set assignments. -Deep Neural Networks (DNN)DNNs were trained using the python library Pytorch 48 . We defined four hidden layers, composed of 1000, 1000, 100 and 10 nodes, respectively. The number of neurons in each layer was selected to be smaller than the input fingerprint size to reduce the chances of overfitting 49 .Rectified linear unit (ReLU) activation was used in all cases. The training data was processed in batches of size equal to 15% of the number of instances. We used Stochastic Gradient Descent with Nesterov momentum, which was set to 0.9 and kept constant during the training phase 50 .The networks were trained over 4,000 epochs, and early stopping was used in all cases, i.e., the ...

show abstract

“…Hence, the model is used to identify data points that lead to the highest information gain (exploration) as opposed to identify newly active data points (exploitation). Previously this approach was shown to lead to a quick improvement in biological activity . Accordingly, using the current and a public dataset, a first PCM SGLT1 screening model was developed that effectively predicted moderately active SGLT1 inhibitors outside the chemical space of the training set .…”

Section: Discussionmentioning

confidence: 99%

Novel natural and synthetic inhibitors of solute carriers SGLT1 and SGLT2

Oranje

Gouka

Burggraaff

et al. 2019

Pharmacology Res & Perspec

View full text Add to dashboard Cite

Selective analogs of the natural glycoside phloridzin are marketed drugs that reduce hyperglycemia in diabetes by inhibiting the active sodium glucose cotransporter SGLT2 in the kidneys. In addition, intestinal SGLT1 is now recognized as a target for glycemic control. To expand available type 2 diabetes remedies, we aimed to find novel SGLT1 inhibitors beyond the chemical space of glycosides. We screened a bioactive compound library for SGLT1 inhibitors and tested primary hits and additional structurally similar molecules on SGLT1 and SGLT2 (SGLT1/2). Novel SGLT1/2 inhibitors were discovered in separate chemical clusters of natural and synthetic compounds. These have IC 50 ‐values in the 10‐100 μmol/L range. The most potent identified novel inhibitors from different chemical clusters are (SGLT1‐IC 50 Mean ± SD, SGLT2‐IC 50 Mean ± SD): (+)‐pteryxin (12 ± 2 μmol/L, 9 ± 4 μmol/L), (+)‐ε‐viniferin (58 ± 18 μmol/L, 110 μmol/L), quinidine (62 μmol/L, 56 μmol/L), cloperastine (9 ± 3 μmol/L, 9 ± 7 μmol/L), bepridil (10 ± 5 μmol/L, 14 ± 12 μmol/L), trihexyphenidyl (12 ± 1 μmol/L, 20 ± 13 μmol/L) and bupivacaine (23 ± 14 μmol/L, 43 ± 29 μmol/L). The discovered natural inhibitors may be further investigated as new potential (prophylactic) agents for controlling dietary glucose uptake. The new diverse structure activity data can provide a starting point for the optimization of novel SGLT1/2 inhibitors and support the development of virtual SGLT1/2 inhibitor screening models.

show abstract

Discovering Highly Potent Molecules from an Initial Set of Inactives Using Iterative Screening

Cited by 30 publications

References 80 publications

Reply to “Missed opportunities in large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery”

Reply to “Missed opportunities in large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery”

Reliable Prediction Errors for Deep Neural Networks Using Test-Time Dropout

Novel natural and synthetic inhibitors of solute carriers SGLT1 and SGLT2

Contact Info

Product

Resources

About