Summary. Various bootstraps have been proposed for bootstrapping clustered data from one-way arrays. The simulation results in the literature suggest that some of these methods work quite well in practice; the theoretical results are limited and more mixed in their conclusions. For example, McCullagh reached negative conclusions about the use of non-parametric bootstraps for one-way arrays. The purpose of this paper is to extend our understanding of the issues by discussing the effect of different ways of modelling clustered data, the criteria for successful bootstraps used in the literature and extending the theory from functions of the sample mean to include functions of the between and within sums of squares and non-parametric bootstraps to include model-based bootstraps. We determine that the consistency of variance estimates for a bootstrap method depends on the choice of model with the residual bootstrap giving consistency under the transformation model whereas the cluster bootstrap gives consistent estimates under both the transformation and the random-effect model. In addition we note that the criteria based on the distribution of the bootstrap observations are not really useful in assessing consistency.
Linear mixed effects models are highly flexible in handling a broad range of
data types and are therefore widely used in applications. A key part in the
analysis of data is model selection, which often aims to choose a parsimonious
model with other desirable properties from a possibly very large set of
candidate statistical models. Over the last 5-10 years the literature on model
selection in linear mixed models has grown extremely rapidly. The problem is
much more complicated than in linear regression because selection on the
covariance structure is not straightforward due to computational issues and
boundary problems arising from positive semidefinite constraints on covariance
matrices. To obtain a better understanding of the available methods, their
properties and the relationships between them, we review a large body of
literature on linear mixed model selection. We arrange, implement, discuss and
compare model selection methods based on four major approaches: information
criteria such as AIC or BIC, shrinkage methods based on penalized loss
functions such as LASSO, the Fence procedure and Bayesian techniques.Comment: Published in at http://dx.doi.org/10.1214/12-STS410 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.