Precision Oncology beyond Targeted Therapy: Combining Omics Data with Machine Learning Matches the Majority of Cancer Cells to Effective Therapeutics

Ding, Michael Q.; Chen, Lujia; Cooper, Gregory F.; Young, Jonathan D.; Lu, Xinghua

doi:10.1158/1541-7786.mcr-17-0378

Cited by 145 publications

(125 citation statements)

References 28 publications

Supporting

Mentioning

125

Contrasting

Order By: Relevance

“…Specifically, we tried architectures from 978-500-15 to 978-2000-1000-200 to select a model with as a simple structure as possible and with a low training error. Based on our previous experience, a three hidden layer model with 1000-1500 nodes on the first hidden layer, ~1000 nodes on the second hidden layer and small bottleneck on the third hidden layer usually performs the best (Chen et al, 2016;Ding et al, 2018). The best model we achieved in this study had a structure of 978-1000-1000-100.…”

Section: Model Architecture and Training Settingmentioning

confidence: 68%

Learning to Encode Cellular Responses to Systematic Perturbations with Deep Generative Models

Xue¹,

Ding²,

Lu³

2020

Preprint

Self Cite

View full text Add to dashboard Cite

Components of cellular signaling systems are organized as hierarchical networks, and perturbing different components of the system often leads to transcriptomic profiles that exhibit compositional statistical patterns. Mining such patterns to investigate how cellular signals are encoded is an important problem in systems biology. Here, we investigated the capability of deep generative models (DGMs) for modeling signaling systems and learning representations for transcriptomic profiles derived from cells under diverse perturbations. Specifically, we show that the variational autoencoder and the supervised vector-quantized variational autoencoder can accurately regenerate gene expression data. Both models can learn representations that reveal the relationships between different classes of perturbagens and enable mappings between drugs and their target genes. In summary, DGMs can adequately depict how cellular signals are encoded.The resulting representations have broad applications in systems biology, such as studying the mechanism-of-action of drugs.

show abstract

Section: Model Architecture and Training Settingmentioning

confidence: 68%

Learning to Encode Cellular Responses to Systematic Perturbations with Deep Generative Models

Xue¹,

Ding²,

Lu³

2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Its property of having the ability to reduce dimension and extract non-linear features [56] have been leveraged by many studies. In one oncology study, autoencoders have been able to extract cellular features, which can correlate with drug sensitivity involved with cancer cell lines [57]. Autoencoder was also used to discover two liver cancer sub-types that had distinguishable chances of survival [58].…”

Section: Autoencoder (Ae)mentioning

confidence: 99%

Machine learning based imputation techniques for estimating phylogenetic trees from incomplete distance matrices

Bhattacharjee

Bayzid

2019

Preprint

View full text Add to dashboard Cite

Background: Due to the recent advances in sequencing technologies and species tree estimation methods capable of taking gene tree discordance into account, notable progress has been achieved in constructing large scale phylogenetic trees from genome wide data. However, substantial challenges remain in leveraging this huge amount of molecular data. One of the foremost among these challenges is the need for efficient tools that can handle missing data. Popular distance-based methods such as neighbor joining and UPGMA require that the input distance matrix does not contain any missing values.Results: We introduce two highly accurate machine learning based distance imputation techniques. One of our approaches is based on matrix factorization, and the other one is an autoencoder based deep learning technique. We evaluate these two techniques on a collection of simulated and biological datasets, and show that our techniques match or improve upon the best alternate techniques for distance imputation. Moreover, our proposed techniques can handle substantial amount of missing data, to the extent where the best alternate methods fail.Conclusions: This study shows for the first time the power and feasibility of applying deep learning techniques for imputing distance matrices. The autoencoder based deep learning technique is highly accurate and scalable to large dataset. We have made these techniques freely available as a cross-platform software (available at https://github.com/Ananya-Bhattacharjee/ImputeDistances).

show abstract

“…As a result, many studies have focused on large pre-clinical pharmacogenomics datasets such as cancer cell lines as a proxy to patients (Barretina et al, 2012;Iorio et al, 2016). A majority of the current computational methods are trained on cell line datasets and then tested on other cell line or patient datasets (Sharifi-Noghabi et al, 2019b;Sakellaropoulos et al, 2019;Mourragui et al, 2019;Rampášek et al, 2019;Ding et al, 2018;Geeleher et al, 2017Geeleher et al, , 2014. However, cell lines and patients data, even with the same set of genes, do not have identical distributions due to the lack of an immune system and the tumor microenvironment in cell lines, which means a model cannot be trained on cell lines and then tested on patients (Mourragui et al, 2019).…”

Section: Introductionmentioning

confidence: 99%

AITL: Adversarial Inductive Transfer Learning with input and output space adaptation for pharmacogenomics

Noghabi

Peng

Zolotareva

et al. 2020

Preprint

View full text Add to dashboard Cite

Motivation: the goal of pharmacogenomics is to predict drug response in patients using their singleor multi-omics data. A major challenge is that clinical data (i.e. patients) with drug response outcome is very limited, creating a need for transfer learning to bridge the gap between large pre-clinical pharmacogenomics datasets (e.g. cancer cell lines), as a source domain, and clinical datasets as a target domain. Two major discrepancies exist between pre-clinical and clinical datasets: 1) in the input space, the gene expression data due to difference in the basic biology, and 2) in the output space, the different measures of the drug response. Therefore, training a computational model on cell lines and testing it on patients violates the i.i.d assumption that train and test data are from the same distribution. Results: We propose Adversarial Inductive Transfer Learning (AITL), a deep neural network method for addressing discrepancies in input and output space between the pre-clinical and clinical datasets. AITL takes gene expression of patients and cell lines as the input, employs adversarial domain adaptation and multi-task learning to address these discrepancies, and predicts the drug response as the output. To the best of our knowledge, AITL is the first adversarial inductive transfer learning method to address both input and output discrepancies. Experimental results indicate that AITL outperforms state-of-the-art pharmacogenomics and transfer learning baselines and may guide precision oncology more accurately. Availability of codes and supplementary material: https://github.com/hosseinshn/AITL Contact: ccollins@prostatecentre.com and ester@cs.sfu.ca © 1

show abstract

Precision Oncology beyond Targeted Therapy: Combining Omics Data with Machine Learning Matches the Majority of Cancer Cells to Effective Therapeutics

Cited by 145 publications

References 28 publications

Learning to Encode Cellular Responses to Systematic Perturbations with Deep Generative Models

Learning to Encode Cellular Responses to Systematic Perturbations with Deep Generative Models

Machine learning based imputation techniques for estimating phylogenetic trees from incomplete distance matrices

AITL: Adversarial Inductive Transfer Learning with input and output space adaptation for pharmacogenomics

Contact Info

Product

Resources

About