2015
DOI: 10.1002/cmdc.201500424
|View full text |Cite
|
Sign up to set email alerts
|

How Consistent are Publicly Reported Cytotoxicity Data? Large‐Scale Statistical Analysis of the Concordance of Public Independent Cytotoxicity Measurements

Abstract: While increased attention is being paid to the impact of data quality in cell-line sensitivity and toxicology modeling, to date, no systematic study has evaluated the comparability of independent cytotoxicity measurements on a large-scale. Here, we estimate the experimental uncertainty of public cytotoxicity data from ChEMBL version 19. We applied stringent filtering criteria to assemble a curated data set comprised of pIC50 data for compound-cell line systems measured in independent laboratories. The estimate… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

5
41
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
6
1

Relationship

3
4

Authors

Journals

citations
Cited by 27 publications
(46 citation statements)
references
References 70 publications
5
41
0
Order By: Relevance
“…We obtained high performance on the test set for all networks, with mean RMSE values in the 0.65-0.96 pIC50 range ( Figure 2). These errors in prediction are comparable to the uncertainty of heterogeneous IC50 measurements in ChEMBL [8], and to the performance of drug sensitivity prediction models previously reported [15,18,80]. Notably, high performance was also obtained for data sets containing few hundred compounds (e.g., LoVo or HCT-15), suggesting that the framework proposed here is applicable to model small data sets.…”
Section: Resultssupporting
confidence: 83%
See 2 more Smart Citations
“…We obtained high performance on the test set for all networks, with mean RMSE values in the 0.65-0.96 pIC50 range ( Figure 2). These errors in prediction are comparable to the uncertainty of heterogeneous IC50 measurements in ChEMBL [8], and to the performance of drug sensitivity prediction models previously reported [15,18,80]. Notably, high performance was also obtained for data sets containing few hundred compounds (e.g., LoVo or HCT-15), suggesting that the framework proposed here is applicable to model small data sets.…”
Section: Resultssupporting
confidence: 83%
“…We gathered cytotoxicity IC50 data for 8 cancer cell lines and 25 protein targets from ChEMBL database version 23 using the chembl_webresource_client Python module [63][64][65]. To gather high-quality bioactivity data sets, we only kept IC50 values for small molecules that satisfied the following stringent filtering criteria [8]: (i) activity unit equal to "nM", and (ii) activity relationship equal to '='. The average pIC50 value was calculated when multiple IC50 values were annotated for the same compound-cell line or compound-protein pair.…”
Section: Data Collection and Curationmentioning
confidence: 99%
See 1 more Smart Citation
“…The mean R 2 values (averaged across 5 replicates) for the observed against the predicted pIC50 values on the set were above 0.5 for all data sets (see Figure S1 for details), indicating that our choice of descriptors provides a molecular representation that captures aspects of the chemical structures related to bioactivity. The average RMSEtest values were in the 0.5-0.9 range, consistent with the expected modelling errors for heterogeneous IC50 data from ChEMBL 75,76 .…”
Section: Data Set Modelabilitysupporting
confidence: 81%
“…This is consistent with the formulation of RF, as RF predictions are the average value of those similar instances in the training data. Hence, RF models never generate predictions outside the range of activities comprised in the training data 76 . By contrast, Ridge Regression models often extrapolated compound activity to values outside those present in the training set, generating predictions higher than the maximum activity value in the training set ( Figure S3).…”
Section: Extrapolation Power Of Rf and Ridge Regressionmentioning
confidence: 99%