Improved large-scale prediction of growth inhibition patterns using the NCI60 cancer cell line panel

Cortés-Ciriano, Isidro; Westen, Gerard J. P. van; Bouvier, Guillaume; Nilges, Michaël; Overington, John P.; Bender, Andreas; Malliavin, Thérèse E.

doi:10.1093/bioinformatics/btv529

Cited by 112 publications

(140 citation statements)

References 53 publications

(76 reference statements)

Supporting

Mentioning

137

Contrasting

Unclassified

Order By: Relevance

“…[30] and Kalliokoski et al [31] set the basis to estimate the maximuma chievable performance of in silico models trained on data issued from different laboratories, which is quantified through the maximum achievable R 2 values on atest or external set. [39] While cellular sensitivity data are of relevance in predictive modeling as aw hole, [33][34][35][40][41][42] and in the fields of cell-line sensitivity and toxicology modeling in particular, [40,41,[43][44][45][46][47] to date, no systematic study has evaluated the comparability of public in vitro cytotoxicity data on al arge scale. To address this shortage, we have implemented in Rapipeline for the automatic extraction and curation of cell-line sensitivity data from ChEMBL version 19.…”

Section: Introductionmentioning

confidence: 99%

How Consistent are Publicly Reported Cytotoxicity Data? Large‐Scale Statistical Analysis of the Concordance of Public Independent Cytotoxicity Measurements

Cortés-Ciriano

Bender

2015

ChemMedChem

Self Cite

View full text Add to dashboard Cite

While increased attention is being paid to the impact of data quality in cell-line sensitivity and toxicology modeling, to date, no systematic study has evaluated the comparability of independent cytotoxicity measurements on a large-scale. Here, we estimate the experimental uncertainty of public cytotoxicity data from ChEMBL version 19. We applied stringent filtering criteria to assemble a curated data set comprised of pIC50 data for compound-cell line systems measured in independent laboratories. The estimated experimental uncertainty calculated was a mean unsigned error (MUE) value of 0.61-0.76, a median unsigned error (MedUE) value of 0.51-0.58, and a standard deviation of 0.76-1.00 pIC50 units. The experimental uncertainty (σE) estimated from all pairs of cytotoxicity measurements with a ΔpIC50 value lower than 2.5 was found to be 0.59-0.77 pIC50 units, and thus 21-60% and 21-26% higher than that of pKi and pIC50 data for ligand-protein data (σE =0.47-0.48 pKi units and σE =0.57-0.61 pIC50 units, respectively). The estimated σE value from the pairs of pIC50 values measured with metabolic assays was 0.98, whereas the σE value was found to be 0.69 when using the 1388 pIC50 pairs measured using exactly the same experimental setup. The maximum achievable Pearson correlation coefficient (RPearsonmax.2) of in silico models trained on cytotoxicity data from different laboratories was estimated to be 0.51-0.85, which is considerably different from the value of 1 corresponding to perfect predictions, hinting at the maximum performance one can expect also from computational cytotoxicity predictions. The lowest concordance between pairs of measurements was found for the drugs paclitaxel, methotrexate, zidovudine, and docetaxel, and for the cell lines HepG2, NCI-H460, L1210, and CCRF-CEM, hinting at particular sensitivity of those systems to experimental setups. The highest concordance was estimated for the compound-cell line system HL-60-etoposide (σE =0.70), whereas the lowest for L1210-methotrexate (σE =1.68). We found that annotation errors are responsible for the high discordance observed for some pairs of measurements, pointing out the importance of data curation when automatically extracting cytotoxicity data from public databases. Likewise, these results highlight the importance of estimating compound cytotoxicity with assays providing complementary biological information (i.e., metabolic, clonogenic and assays based on cell membrane integrity), especially when the mechanism of action of test compounds is unknown. From this analysis, guidelines can be created on the reliability of cytotoxicity data from public databases, which could ultimately prove valuable for modeling purposes, and to guide reporting of data in the literature.

show abstract

Section: Introductionmentioning

confidence: 99%

How Consistent are Publicly Reported Cytotoxicity Data? Large‐Scale Statistical Analysis of the Concordance of Public Independent Cytotoxicity Measurements

Cortés-Ciriano

Bender

2015

ChemMedChem

Self Cite

View full text Add to dashboard Cite

show abstract

“…It is important to consider that correlation metrics depend on the range of the dependent variable, and hence one might obtain low errors in prediction (i.e., low RMSE values) and yet a low R 2 value if the dependent variable spans few bioactivity units (Alexander, Tropsha, & Winkler, 2015;Cortés-Ciriano et al, 2016). Determining whether a given model shows good generalization capabilities depends on the drug discovery stage in which it is applied.…”

Section: Understanding Resultsmentioning

confidence: 99%

Elucidating Compound Mechanism of Action and Predicting Cytotoxicity Using Machine Learning Approaches, Taking Prediction Confidence into Account

Drakakis

Cortés-Ciriano

Alexander-Dann

et al. 2019

CP Chemical Biology

Self Cite

View full text Add to dashboard Cite

The modes of action (MoAs) of drugs frequently are unknown, because many are small molecules initially identified from phenotypic screens, giving rise to the need to elucidate their MoAs. In addition, the high attrition rate for candidate drugs in preclinical studies due to intolerable toxicity has motivated the development of computational approaches to predict drug candidate (cyto)toxicity as early as possible in the drug-discovery process. Here, we provide detailed instructions for capitalizing on bioactivity predictions to elucidate the MoAs of small molecules and infer their underlying phenotypic effects. We illustrate how these predictions can be used to infer the underlying antidepressive effects of marketed drugs. We also provide the necessary functionalities to model cytotoxicity data using single and ensemble machine-learning algorithms. Finally, we give detailed instructions on how to calculate confidence intervals for individual predictions using the conformal prediction framework. C 2019 by John Wiley & Sons, Inc.Keywords: ChEMBL r cytotoxicity r in silico bioactivity prediction r mechanism of action r polypharmacology r toxicology modeling

show abstract

“…We used the recommended values for RF hyperparameters (1000 for the number of trees and the square root of the number of considered features for m try ). We preferred this to tuning these hyperparameters for each training set, as RF tuning generally results in just marginal improvements at the cost of being much more computationally expensive 38,56,57 . As no model selection was carried out for this algorithm, standard LOOCV was performed to estimate the performance of RF using all the features (RF-all) on each data set (treatment-cancer type-molecular profile).…”

Section: Multi-gene Classifiers With Built-in Feature Selection (Fs)mentioning

confidence: 99%

“…Indeed, while typically only tens of tumours have their response to the drug available, the molecular profiles of these tumours may easily aggregate over 50,000 features. To face this challenge, ML algorithms with built-in FS such as Elastic Nets 32-36 , Ridge 33,36 , LASSO 33,34,36,37 or Random Forest (RF) 12,33,35,36,38,39 have been used to model pharmacogenomics data from in vitro cell lines. For instance, RF ignores those features irrelevant for predicting drug response and thus has been able to tackle to some extent this challenge.…”

Section: Introductionmentioning

confidence: 99%

Machine learning models to predictin vivodrug response via optimal dimensionality reduction of tumour molecular profiles

Nguyen¹,

Naulaerts²,

Bomane³

et al. 2018

Preprint

View full text Add to dashboard Cite

Inter-tumour heterogeneity is one of cancer's most fundamental features. Patient stratification based on drug response prediction is hence needed for effective anti-cancer therapy. However, lessons from the past indicate that single-gene markers of response are rare and/or often fail to achieve a significant impact in clinic. In this context, Machine Learning (ML) is emerging as a particularly promising complementary approach to precision oncology. Here we leverage comprehensive Patient-Derived Xenograft (PDX) pharmacogenomic data sets with dimensionality-reducing ML algorithms with this purpose. Results show that combining multiple gene alterations via ML leads to better discrimination between sensitive and resistant PDXs in 19 of the 26 analysed cases. Highly predictive ML models employing concise gene lists were found for three cases: Paclitaxel (breast cancer), Binimetinib (breast cancer) andCetuximab (colorectal cancer). Interestingly, each of these ML models identify some responsive PDXs not harbouring the best actionable mutation for that case (such PDXs were missed by those single-gene markers). Moreover, ML multi-gene predictors generally retrieve a much higher proportion of treatment-sensitive PDXs than the corresponding single-gene 2 marker. As PDXs often recapitulate clinical outcomes, these results suggest that many more patients could benefit from precision oncology if multiple ML algorithms were applied to existing clinical pharmacogenomics data, especially those algorithms generating classifiers combining data-selected gene alterations.

show abstract

Improved large-scale prediction of growth inhibition patterns using the NCI60 cancer cell line panel

Cited by 112 publications

References 53 publications

How Consistent are Publicly Reported Cytotoxicity Data? Large‐Scale Statistical Analysis of the Concordance of Public Independent Cytotoxicity Measurements

How Consistent are Publicly Reported Cytotoxicity Data? Large‐Scale Statistical Analysis of the Concordance of Public Independent Cytotoxicity Measurements

Elucidating Compound Mechanism of Action and Predicting Cytotoxicity Using Machine Learning Approaches, Taking Prediction Confidence into Account

Machine learning models to predictin vivodrug response via optimal dimensionality reduction of tumour molecular profiles

Contact Info

Product

Resources

About