Benchmarking Variational AutoEncoders on cancer transcriptomics data

elTager, Mostafa; Abdelaal, Tamim; Charrout, Mohammed; Mahfouz, Ahmed; Reinders, Marcel J. T.; Makrodimitris, Stavros

doi:10.1101/2023.02.09.527832

Cited by 4 publications

(5 citation statements)

References 35 publications

(44 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The performances of the methods were estimated using another held-out test set comprising previously unseen data points. If, for some reason, the validation loss is not predictive of the performance at a specific downstream task -as we have previously shown can be the case for VAEs trained on RNA-Seq data [28] -then the chosen hyperparameters for a given model might not be the optimal ones for the downstream task. Our results here hint that this might be the case in our setting too.…”

Section: Discussionmentioning

confidence: 99%

An in-depth comparison of linear and non-linear joint embedding methods for bulk and single-cell multi-omics

Makrodimitris

Pronk

Abdelaal

et al. 2023

Preprint

Self Cite

View full text Add to dashboard Cite

Multi-omic analyses contribute to understanding complex biological processes, but also to making reliable predictions about, for example, disease outcomes. Several linear joint dimensionality reduction methods exist, but recently neural networks are more commonly used to embed different -omics into the same non-linear manifold. We compared linear to non-linear joint embedding methods using bulk and single-cell data. For modality imputation, non-linear methods had a clear advantage. Comparisons in downstream supervised tasks lead to the following insights: First, concatenating the principal components of each modality is a competitive baseline for multi-modal prediction. If only one modality was available at test time, joint embeddings yielded significant performance improvements with respect to a unimodal predictor. Second, imputed omics profiles can be fed to classifiers trained on real data with limited performance drops. Overall, the product-of-experts architecture performed well in most tasks while a common encoder of concatenated modalities performed poorly.

show abstract

Section: Discussionmentioning

confidence: 99%

An in-depth comparison of linear and non-linear joint embedding methods for bulk and single-cell multi-omics

Makrodimitris

Pronk

Abdelaal

et al. 2023

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The identification of disease subtypes requires performing a clustering algorithm at some point. Even though iterative training of the clustering in a joint autoencoder loss function can overcome inconsistencies between training and downstream clustering performance [8, 19, 21], we chose for a decoupled strategy. This was to 1) avoid having too many terms in loss function to confuse training, and 2) reduce computation time and initialization settings with iteratively training clustering in a joint loss function.…”

Section: Discussionmentioning

confidence: 99%

“…In our study, clustering is done as an additional step on the latent features to guarantee the optimal performance of the deconfounding autoencoder. However, other researchers have pointed out the inconsistency between validation loss and downstream clustering performance [34] and proposed to iteratively train the clustering in a joint loss function with the autoencoder [35, 36]. We chose the standalone clustering strategy over the iterative one because 1) we want to avoid having too many terms in the loss function to confuse the training process and 2) the iterative clustering relies on a good initialization and is computationally expensive.…”

Section: Discussionmentioning

confidence: 99%

Novel multi-omics deconfounding variational autoencoders can obtain meaningful disease subtyping

Li,

Katz,

Saccenti

et al. 2024

Preprint

View full text Add to dashboard Cite

In the field of precision medicine, the use of multi-omics data for patient stratification holds great promise for delivering tailored treatments based on comprehensive individual biological profiles. The clinical potential of (multi-)omics data, however, faces significant limitations due to the presence of confounding factors in the data, such as noise from experimental procedures or other irrelevant biological signals. As confounding factors in the data potentially bias patient clustering, deconfounding deep learning frameworks, such as autoencoders, have been developed. Despite encouraging initial outcomes, these frameworks have seen limited validation when applied to clustering tasks using multi-omics data. Based on different deconfounding strategies, we propose four novel multi-omics variational autoencoder frameworks for clustering, capable of reducing confounding effects while preserving the integrity of true biological patterns. We therefore simulate artificial confounders of different effects (linear, non-linear and categorical) using gene expression and DNA methylation data from the TCGA pan-cancer study. We find the conditional multi-omics variational autoencoder to be clearly superior to other models in terms of stability, deconfounding potential, and in retrieving biologically-driven clustering structures. Conversely, the use of adversarial training for deconfounding proves challenging in terms of model optimization, leading to a poor deconfounding performance. Our study finds profound differences between autoencoder-based frameworks for clustering multi-omics data in the presence of confounders. The knowledge obtained from our experiments may aid in selecting an appropriate framework for multi-omics studies, ultimately facilitating better patient stratification for precision medicine.

show abstract

“…The performances of the methods were estimated using another held-out test set comprising previously unseen data points. If, for some reason, the validation loss is not predictive of the performance at a specific downstream task – as we have previously shown can be the case for VAEs trained on RNA-Seq data [ 30 ] – then the chosen hyperparameters for a given model might not be the optimal ones for the downstream task. Our results here hint that this might be the case in our setting too.…”

Section: Discussionmentioning

confidence: 99%

An in-depth comparison of linear and non-linear joint embedding methods for bulk and single-cell multi-omics

Makrodimitris,

Pronk,

Abdelaal

et al. 2023

Briefings in Bioinformatics

View full text Add to dashboard Cite

Multi-omic analyses are necessary to understand the complex biological processes taking place at the tissue and cell level, but also to make reliable predictions about, for example, disease outcome. Several linear methods exist that create a joint embedding using paired information per sample, but recently there has been a rise in the popularity of neural architectures that embed paired -omics into the same non-linear manifold. This work describes a head-to-head comparison of linear and non-linear joint embedding methods using both bulk and single-cell multi-modal datasets. We found that non-linear methods have a clear advantage with respect to linear ones for missing modality imputation. Performance comparisons in the downstream tasks of survival analysis for bulk tumor data and cell type classification for single-cell data lead to the following insights: First, concatenating the principal components of each modality is a competitive baseline and hard to beat if all modalities are available at test time. However, if we only have one modality available at test time, training a predictive model on the joint space of that modality can lead to performance improvements with respect to just using the unimodal principal components. Second, -omic profiles imputed by neural joint embedding methods are realistic enough to be used by a classifier trained on real data with limited performance drops. Taken together, our comparisons give hints to which joint embedding to use for which downstream task. Overall, product-of-experts performed well in most tasks and was reasonably fast, while early integration (concatenation) of modalities did quite poorly.

show abstract

Benchmarking Variational AutoEncoders on cancer transcriptomics data

Cited by 4 publications

References 35 publications

An in-depth comparison of linear and non-linear joint embedding methods for bulk and single-cell multi-omics

An in-depth comparison of linear and non-linear joint embedding methods for bulk and single-cell multi-omics

Novel multi-omics deconfounding variational autoencoders can obtain meaningful disease subtyping

An in-depth comparison of linear and non-linear joint embedding methods for bulk and single-cell multi-omics

Contact Info

Product

Resources

About