Permutation tests are the standard technique for significance testing in Analysis of Variance Simultaneous Component Analysis. However, there is a vast number of alternative approaches for permutation testing, and the number of choices grows in relation to the complexity of the study design. In this paper, we focus on longitudinal intervention studies with multivariate outcomes, a relevant experimental design in clinical studies where the outcome is an omics profile (such as in genomics, metabolomics, and the like). We propose a new technique to derive power curves tailored to the size and (un)balanced nature of the data set in the study. This technique is useful to identify misleading permutation tests, with lack of power or overly optimistic outcomes. We found that choosing the best permutation approach is far from intuitive and that there is a significant risk of deriving incorrect conclusions in real-life analyses.
Motivation: ANOVA Simultaneous Component Analysis (ASCA) is a popular method for the analysis of multivariate data yielded by designed experiments. Meaningful associations between factors/interactions of the experimental design and measured variables in the data set are typically identified via significance testing, with permutation tests being the standard go-to choice. However, in settings with large numbers of variables, the ``holistic'' testing approach of ASCA (all variables considered) often overlooks statistically significant effects encoded by only a few variables.
Results: We propose Variable-selection ASCA (VASCA), a method that generalizes ASCA through variable selection, augmenting its statistical power without inflating the Type-I error risk. The method is evaluated with simulations and with a real data set from a multi-omic clinical experiment. We show that VASCA is more powerful than both ASCA and the widely-adopted False Discovery Rate (FDR) controlling procedure; the latter is used as a benchmark for variable selection based on multiple significance testing. We further illustrate the usefulness of VASCA for exploratory data analysis in comparison to the popular Partial Least Squares Discriminant Analysis (PLS-DA) method and its sparse counterpart (sPLS-DA).
Availability: The code for VASCA is available in the MEDA Toolbox at https://github.com/josecamachop/MEDA-Toolbox
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.