2022
DOI: 10.1186/s13321-022-00611-w
|View full text |Cite
|
Sign up to set email alerts
|

Analysis of the benefits of imputation models over traditional QSAR models for toxicity prediction

Abstract: Recently, imputation techniques have been adapted to predict activity values among sparse bioactivity matrices, showing improvements in predictive performance over traditional QSAR models. These models are able to use experimental activity values for auxiliary assays when predicting the activity of a test compound on a specific assay. In this study, we tested three different multi-task imputation techniques on three classification-based toxicity datasets: two of small scale (12 assays each) and one large scale… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(10 citation statements)
references
References 34 publications
0
10
0
Order By: Relevance
“…Prior to the launch of ChEMBL, only large private organisations were able to access diverse and high-quality (proprietary or commercial) bioactivity data sets for a wide range of biological targets at scale. Data from ChEMBL has proven indispensable for the development ( 26 ), validation and benchmarking ( 27–29 ) of a wide range of AI and other in silico applications, including those described below.…”
Section: New Features and Ai Applications Of Existing Data Resourcesmentioning
confidence: 99%
“…Prior to the launch of ChEMBL, only large private organisations were able to access diverse and high-quality (proprietary or commercial) bioactivity data sets for a wide range of biological targets at scale. Data from ChEMBL has proven indispensable for the development ( 26 ), validation and benchmarking ( 27–29 ) of a wide range of AI and other in silico applications, including those described below.…”
Section: New Features and Ai Applications Of Existing Data Resourcesmentioning
confidence: 99%
“…For the prediction of AR activity, the Tox21 and ToxCast data sets from the program of U.S. Environmental Protection Agency (U.S. EPA) have been widely used to build the classification models that identify binders, agonists, and antagonists of AR from thousands of chemicals with diverse molecular structures. Later, the National Center of Computational Toxicology of the U.S. EPA launched the Collaborative Modeling Project of Androgen Receptor Activity (CoMPARA) to develop consensus models based on traditional ML algorithms for predicting the AR activity of man-made chemicals. Previous studies have shown that RF and SVM models with molecular descriptors or molecular fingerprints exhibited good predictive ability for AR activity. , Additionally, gradient boosting decision tree models with a multiscale weighted colored graph achieve the highest balanced accuracy compared to models with different molecular fingerprints in the NR-AR and NR-AR-LBD data sets of Tox21 . Evaluations by Walter et al suggested that eXtreme Gradient Boosting (XGB) models have higher performance than multitask deep neural networks using Morgan fingerprints for the same data sets . For the 11 data sets from the AR signaling pathway of ToxCast/Tox21 assays, Bayesian machine learning models with extended-connectivity fingerprints (ECFP) of diameter 6 show better statistical performance than other ML models such as RF and SVM …”
Section: Introductionmentioning
confidence: 99%
“…Compounds with experimental data against multiple or ideally all targets are rare, making the data density of the molecules–targets matrix very low. Data imputation has been proposed as a solution. Imputation is the process of using predicted values for missing data points in the data set used to train the machine learning models. The complexity of the imputation strategy to obtain the predictions ranges from the simple computation of the mean value of the known data points per task to the use of deep learning models .…”
Section: Introductionmentioning
confidence: 99%