Letter to the Editor

Ei, Traboulsi

doi:10.3109/13816818409006123

Cited by 2 publications

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Guidelines for standardising the application of discriminant analysis of principal components to genotype data

Thia

2022

Preprint

View full text Add to dashboard Cite

Multivariate methods are incredibly beneficial for population genetic analyses where the number of measured variables (genetic loci) can easily exceed the number of sampled individuals. Discriminant analysis of principal components (DAPC) has become a popular method for visualising population structure in genotype datasets due to its simplicity, its computational speed, and its freedom from demographic assumptions. Despite the popularity of DAPC in population genetic studies, there has been little discussion on best practise and parameterisation. Unappreciated, perhaps, is the fact that unlike principal component analysis (PCA), which is a hypothesis free method, discriminant analysis (DA) is a hypothesis driven method. That is, when performing a DA, a researcher is making an explicit hypothesis about how variation in a set of predictor variables is organised among pre-defined groups in a sample set. Parameter choice is critical to ensure the results produced by a DA are biologically meaningful. In a DAPC, one of the most important parameter choices is the number of PC axes, paxes, to use as predictors in a DA of among-population differences. Yet there are no clear guidelines on how researchers should choose paxes. In this work, I propose that the value of paxes is a deterministic feature of a genotype dataset based on population genetic theory. For k discrete populations, only the first k - 1 PC axes are expected to be biologically informative and capture population structure. DAs fit using more than the first k - 1 PC axes are over-parametrised and may discriminate groups using biologically uninformative predictors. Using samples drawn from simulated metapopulations, I show that DAPCs parameterised with the appropriate k - 1 PC axes are: (1) more parsimonious; (2) capture the maximal amount of among-population variation using biologically relevant predictors; (3) are less sensitive to unintended interpretations of population structure; and (4) are more generally applicable to independent sample sets.

show abstract

Guidelines for standardising the application of discriminant analysis of principal components to genotype data

Thia

2022

Preprint

View full text Add to dashboard Cite

show abstract

Guidelines for standardizing the application of discriminant analysis of principal components to genotype data

Thia

2022

Molecular Ecology Resources

View full text Add to dashboard Cite

The biological world is beautifully complex, characterized by variation in multiple dimensions. Multivariate statistics play a pivotal role in helping us make sense of this multidimensionality and developing a deeper appreciation of biology. Describing population genetic patterns, for example, becomes increasingly difficult with many sampled individuals, genetic markers and populations. However, ordination methods can summarize variation across multiple loci to create new synthetic axes and reduce dimensionality. Such new axes of variation

show abstract

Letter to the Editor

Cited by 2 publications

References 2 publications

Guidelines for standardising the application of discriminant analysis of principal components to genotype data

Guidelines for standardising the application of discriminant analysis of principal components to genotype data

Guidelines for standardizing the application of discriminant analysis of principal components to genotype data

Contact Info

Product

Resources

About