2020
DOI: 10.1101/2020.08.12.248278
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Visualizing Population Structure with Variational Autoencoders

Abstract: Dimensionality reduction is a common tool for visualization and inference of population structure from genotypes, but popular methods either return too many dimensions for easy plotting (PCA) or fail to preserve global geometry (t-SNE and UMAP). Here we explore the utility of variational autoencoders (VAEs) – generative machine learning models in which a pair of neural networks seek to first compress and then recreate the input data – for visualizing population genetic variation. VAEs incorporate non-linear re… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
2
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 54 publications
0
2
0
Order By: Relevance
“…Some early efforts used machine learning to account for issues that arise with high-dimensional summary statistics [57]. More recently, machine learning approaches have used various forms of convolutional, recurrent, and “deep” neural networks to improve inference and visualization [814]. One of the goals of moving to these approaches was to enable inference frameworks to operate on the “raw” data (genotype matrices), which avoids the loss of information that comes from reducing genotypes to summary statistics.…”
Section: Introductionmentioning
confidence: 99%
“…Some early efforts used machine learning to account for issues that arise with high-dimensional summary statistics [57]. More recently, machine learning approaches have used various forms of convolutional, recurrent, and “deep” neural networks to improve inference and visualization [814]. One of the goals of moving to these approaches was to enable inference frameworks to operate on the “raw” data (genotype matrices), which avoids the loss of information that comes from reducing genotypes to summary statistics.…”
Section: Introductionmentioning
confidence: 99%
“…We see this as an inherent problem relating to data structure. Previous comparisons of t -SNE found low fidelity with global data patterns, and latent space distances were poor proxies for ‘true’ among-group distances, particularly when compared to VAE (Becht et al 2019; Battey et al 2020). This potentially explains our observed ‘plateau’ of mean optimal K and SD in the t -SNE perplexity grid-search, in that perplexity defines relative weighting of local versus global components (Wattenberg et al 2016).…”
Section: Discussionmentioning
confidence: 89%
“…Various approaches have been developed to approximate what an autoencoder learns. Most commonly, this involves visualisation of the latent dimension, revealing possible clusters or regions of interest [20][21][22]. While autoencoders are frequently being applied to DNA methylation data [10,11,16] little work has been conducted on interpreting individual latent features and exploring, for example, which CpGs share a relation through common latent features.…”
Section: Introductionmentioning
confidence: 99%