2022
DOI: 10.1101/2022.04.13.488270
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Guidelines for standardising the application of discriminant analysis of principal components to genotype data

Abstract: Multivariate methods are incredibly beneficial for population genetic analyses where the number of measured variables (genetic loci) can easily exceed the number of sampled individuals. Discriminant analysis of principal components (DAPC) has become a popular method for visualising population structure in genotype datasets due to its simplicity, its computational speed, and its freedom from demographic assumptions. Despite the popularity of DAPC in population genetic studies, there has been little discussion o… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
14
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(14 citation statements)
references
References 36 publications
0
14
0
Order By: Relevance
“…The data and scripts associated with this paper have been deposited into Dryad (Thia, 2022): https://doi.org/10.5061/dryad.b8gtht7f0.…”
Section: Data Availability Statementmentioning
confidence: 99%
“…The data and scripts associated with this paper have been deposited into Dryad (Thia, 2022): https://doi.org/10.5061/dryad.b8gtht7f0.…”
Section: Data Availability Statementmentioning
confidence: 99%
“…The second is to select components that describe the between cluster variation and use those in a discriminant analysis (DA), which builds a model that can predict the population for each individual. The number of components selected from the PCA is critical as selecting too many axes will result in overfitting the model and create an inflated estimate of the amount of differentiation among clusters (Thia, 2022). While the developers of the DAPC method indicate the importance of this step, they do not provide a defined set of rules for selecting the components (Jombart & Collins, 2022).…”
Section: Figurementioning
confidence: 99%
“…The selection of the number of PC axes has primarily relied on three approaches (Miller et al, 2020): (1) selecting the number of PC axes which explain an arbitrary amount of variance in the data set (often ≥80%), (2) the xvalDapc function, which uses a training and testing set to find the optimal trade‐off between selecting too few, and too many PC axes, or (3) the optim.a.score function, which looks at the assignment success of individuals based on different numbers of PC axes retained, and selects the number of PCs that maximizes assignment success. The study by Thia (2022) highlights caveats to consider for each of these methods. The first method often leads to inclusion of components which explain individual differences, rather than just including those which capture between population differences.…”
Section: Figurementioning
confidence: 99%
See 2 more Smart Citations