2020
DOI: 10.1021/acsomega.0c01251
|View full text |Cite
|
Sign up to set email alerts
|

Pruned Machine Learning Models to Predict Aqueous Solubility

Abstract: Solubility is a key metric for therapeutic compounds. Conversely, insoluble compounds cloud the accuracy of assays at all stages of chemical biology and drug discovery. Herein, we disclose naïve Bayesian classifier models to predict aqueous solubility. Publicly accessible aqueous solubility data were used to create two full, or nonpruned, training sets. These two sets were also combined to create a full fused set, and a training set comprised of a literature collation of solubility data was also considered as … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
13
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8

Relationship

2
6

Authors

Journals

citations
Cited by 15 publications
(14 citation statements)
references
References 35 publications
0
13
0
Order By: Relevance
“…As a result of regression, 18 out of 29 test sets showed R 2 > 0.80 (as can be seen in Figure S14 ). This suggests that solubility between unknown compounds can be still predicted using one of our models, as opposed to approaches in previous studies 20 , 25 , 26 , 30 32 where individual models were trained with sufficient data for solvents.…”
Section: Results and Discussionmentioning
confidence: 97%
See 1 more Smart Citation
“…As a result of regression, 18 out of 29 test sets showed R 2 > 0.80 (as can be seen in Figure S14 ). This suggests that solubility between unknown compounds can be still predicted using one of our models, as opposed to approaches in previous studies 20 , 25 , 26 , 30 32 where individual models were trained with sufficient data for solvents.…”
Section: Results and Discussionmentioning
confidence: 97%
“…Recent works on predicting solubility through machine learning utilized molecular fingerprints in their work. 20 , 24 26 Previous studies use machine learning/deep learning (ML/DL) methods including random forest (RF), support vector regression (SVR), LightGBM, LASSO and so on, 20 , 24 , 26 Naïve-Bayes based models, 25 and also deep learning to predict solubility. While the importance of solubility prediction has been emphasized, various studies have been on solubility prediction has been reported.…”
Section: Introductionmentioning
confidence: 99%
“…A simple search on ChEMBL on February 2, 2021, with the keyword “solubility” resulted in 7546 assay records and a total of 9790 measurements. Perryman et al 44 collected solubility data from ChEMBL and PubChem databases and proposed Bayesian models to predict aqueous solubility. This PubChem dataset comprised a total of 57,824 compounds, out of which 31,644 compounds (54.7%) were defined as soluble.…”
Section: Resultsmentioning
confidence: 99%
“…Given the continuous nature of this metric and our inability to set a clear, quantitative threshold separating "good" from "bad" outcomes, we chose to pursue machine learning regression models in Knime 21 in lieu of the binary classification approach we have used previously for antibacterial growth inhibition, 22,23 Vero cell cytotoxicity, 24 mouse liver microsome stability, 25 and aqueous solubility. 26 The first set of models leveraged a linear regression approach.…”
Section: ■ Resultsmentioning
confidence: 99%
“…In particular, mouse PK studies are a critical hurdle to clear before commencing assays to quantify efficacy and/or mechanism of action in mouse models of disease. As the PK profile of a small molecule is influenced by its physiochemical and absorption−distribution−metabolism− excretion (ADME) characteristics, it is not surprising that numerous reports exist from our laboratories 25,26 and others 18,30,31 as to machine learning models for these contributing properties as well as surrogate measures (e.g. Caco-2 cell permeability).…”
Section: ■ Discussionmentioning
confidence: 99%