2023
DOI: 10.1101/2023.02.09.527832
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Benchmarking Variational AutoEncoders on cancer transcriptomics data

Abstract: Deep generative models, such as variational autoencoders (VAE), have gained increasing attention in computational biology due to their ability to capture complex data manifolds which subsequently can be used to achieve better performance in downstream tasks, such as cancer type prediction or subtyping of cancer. However, these models are difficult to train due to the large number of hyperparameters that need to be tuned. To get a better understanding of the importance of the different hyperparameters, we exami… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 35 publications
(44 reference statements)
0
5
0
Order By: Relevance
“…The performances of the methods were estimated using another held-out test set comprising previously unseen data points. If, for some reason, the validation loss is not predictive of the performance at a specific downstream task -as we have previously shown can be the case for VAEs trained on RNA-Seq data [28] -then the chosen hyperparameters for a given model might not be the optimal ones for the downstream task. Our results here hint that this might be the case in our setting too.…”
Section: Discussionmentioning
confidence: 99%
“…The performances of the methods were estimated using another held-out test set comprising previously unseen data points. If, for some reason, the validation loss is not predictive of the performance at a specific downstream task -as we have previously shown can be the case for VAEs trained on RNA-Seq data [28] -then the chosen hyperparameters for a given model might not be the optimal ones for the downstream task. Our results here hint that this might be the case in our setting too.…”
Section: Discussionmentioning
confidence: 99%
“…The identification of disease subtypes requires performing a clustering algorithm at some point. Even though iterative training of the clustering in a joint autoencoder loss function can overcome inconsistencies between training and downstream clustering performance [8, 19, 21], we chose for a decoupled strategy. This was to 1) avoid having too many terms in loss function to confuse training, and 2) reduce computation time and initialization settings with iteratively training clustering in a joint loss function.…”
Section: Discussionmentioning
confidence: 99%
“…In our study, clustering is done as an additional step on the latent features to guarantee the optimal performance of the deconfounding autoencoder. However, other researchers have pointed out the inconsistency between validation loss and downstream clustering performance [34] and proposed to iteratively train the clustering in a joint loss function with the autoencoder [35, 36]. We chose the standalone clustering strategy over the iterative one because 1) we want to avoid having too many terms in the loss function to confuse the training process and 2) the iterative clustering relies on a good initialization and is computationally expensive.…”
Section: Discussionmentioning
confidence: 99%
“…The performances of the methods were estimated using another held-out test set comprising previously unseen data points. If, for some reason, the validation loss is not predictive of the performance at a specific downstream task – as we have previously shown can be the case for VAEs trained on RNA-Seq data [ 30 ] – then the chosen hyperparameters for a given model might not be the optimal ones for the downstream task. Our results here hint that this might be the case in our setting too.…”
Section: Discussionmentioning
confidence: 99%