2019
DOI: 10.1007/s11692-019-09484-8
|View full text |Cite
|
Sign up to set email alerts
|

Pathologies of Between-Groups Principal Components Analysis in Geometric Morphometrics

Abstract: Good empirical applications of geometric morphometrics (GMM) typically involve several times more variables than specimens, a situation the statistician refers to as "high p/n," where p is the count of variables and n the count of specimens. This note calls your attention to two predictable catastrophic failures of one particular multivariate statistical technique, between-groups principal components analysis (bgPCA), in this high-p/n setting. The more obvious pathology is this: when applied to the patternless… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
38
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 53 publications
(40 citation statements)
references
References 29 publications
2
38
0
Order By: Relevance
“…For those artificial groups, we computed the non-null singular value of the between-population matrix, Z sc ST = ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi n À 1 p , and the leading singular value of the within-population matrix, Z sc S = ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi n À 1 p . For the simulations, the separation condition was never verified, rejecting population structure in all cases (S2(A) Fig) . For smaller sample sizes (n = 10 and L � 1, 000), the separation condition was erroneously checked in 21% simulations, indicating that we had less power to discriminate among artificial groups with small sample sizes (S2(B) Fig) . Those results were also consistent with difficulties reported for between-group PCA [46].…”
Section: Single Population Modelssupporting
confidence: 92%
“…For those artificial groups, we computed the non-null singular value of the between-population matrix, Z sc ST = ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi n À 1 p , and the leading singular value of the within-population matrix, Z sc S = ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi n À 1 p . For the simulations, the separation condition was never verified, rejecting population structure in all cases (S2(A) Fig) . For smaller sample sizes (n = 10 and L � 1, 000), the separation condition was erroneously checked in 21% simulations, indicating that we had less power to discriminate among artificial groups with small sample sizes (S2(B) Fig) . Those results were also consistent with difficulties reported for between-group PCA [46].…”
Section: Single Population Modelssupporting
confidence: 92%
“…Therefore, specific influences or factors should be more effectively isolated at specific levels of the model, as this should, in principle, reduce the effects of the common problem in PCA that leads to mixing of different effects in components if they are not orthogonal "in reality". However, it has been remarked that between-groups PCA [35] (a form of two-level mPCA) can overestimate differences between groups when sample sizes are small, because between-group variation is represented well by differences between means, but within-group variation can be underestimated. Another limitation of mPCA is that the number of non-zero eigenvalues can be constrained by the number of groups at a given level.…”
Section: Discussionmentioning
confidence: 99%
“…Although errors due to measurement and sampling must always be assessed in relation to the specific questions and statistical model, when the questions concerns small differences and crucially depend on accurate estimates of means, variances and covariances, not only sophisticated methods but also high density measurements are no easy fix for very large sampling error. In fact, more variables, as increasingly common in analyses employing semilandmarks, could inflate differences and increase the distortion of between group shape relationships (Bookstein, 2019;Cardini, O'Higgins, & Rohlf, 2019). Again, the problem is neither new nor specific to morphometrics: "Having bucketloads of data only increases the challenges in producing robust and responsible conclusions.…”
Section: Sample Sizementioning
confidence: 99%
“…Common solutions to deal with a large number of variables and relatively small samples are: excluding smallest samples; dimensionality reduction; the use of distance-based resampling statistics in the full shape data space. None of these remedies is perfect and it has been shown that unfavorable N/p ratios might produce spurious patterns in simulated data that only contain random isotropic noise (Bookstein, 2017(Bookstein, , 2019Cardini et al, 2019). Thus, for instance, a PCA might suggest dominant PCs and elliptical scatter on PC1-PC2 (e.g., fig.…”
Section: Too Few Specimens In Highly Multivariate Morphospaces?mentioning
confidence: 99%