Model selection has an important impact on subsequent inference+ Ignoring the model selection step leads to invalid inference+ We discuss some intricate aspects of data-driven model selection that do not seem to have been widely appreciated in the literature+ We debunk some myths about model selection, in particular the myth that consistent model selection has no effect on subsequent inference asymp-totically+ We also discuss an "impossibility" result regarding the estimation of the finite-sample distribution of post-model-selection estimators+
We consider the problem of estimating the conditional distribution of a post-model-selection estimator where the conditioning is on the selected model. The notion of a post-model-selection estimator here refers to the combined procedure resulting from first selecting a model (e.g., by a model selection criterion such as AIC or by a hypothesis testing procedure) and then estimating the parameters in the selected model (e.g., by least-squares or maximum likelihood), all based on the same data set. We show that it is impossible to estimate this distribution with reasonable accuracy even asymptotically. In particular, we show that no estimator for this distribution can be uniformly consistent (not even locally). This follows as a corollary to (local) minimax lower bounds on the performance of estimators for this distribution. Similar impossibility results are also obtained for the conditional distribution of linear functions (e.g., predictors) of the post-model-selection estimator.
We point out some pitfalls related to the concept of an oracle property as used in Fan and Li [2001. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96, 1348-1360; 2002. Variable selection for Cox's proportional hazards model and frailty model. Annals of Statistics 30, 74-99; 2004. New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. Journal of the American Statistical Association 99, 710-723] which are reminiscent of the well-known pitfalls related to Hodges' estimator. The oracle property is often a consequence of sparsity of an estimator. We show that any estimator satisfying a sparsity property has maximal risk that converges to the supremum of the loss function; in particular, the maximal risk diverges to infinity whenever the loss function is unbounded. For ease of presentation the result is set in the framework of a linear regression model, but generalizes far beyond that setting. In a Monte Carlo study we also assess the extent of the problem in finite samples for the smoothly clipped absolute deviation (SCAD) estimator introduced in Fan and Li [2001. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96, 1348-1360]. We find that this estimator can perform rather poorly in finite samples and that its worst-case performance relative to maximum likelihood deteriorates with increasing sample size when the estimator is tuned to sparsity. r
We consider the problem of estimating the unconditional distribution of a post-model-selection estimator. The notion of a post-model-selection estimator here refers to the combined procedure resulting from first selecting a model (e.g., by a model selection criterion like AIC or by a hypothesis testing procedure) and then estimating the parameters in the selected model (e.g., by least-squares or maximum likelihood), all based on the same data set. We show that it is impossible to estimate the unconditional distribution with reasonable accuracy even asymptotically. In particular, we show that no estimator for this distribution can be uniformly consistent (not even locally). This follows as a corollary to (local) minimax lower bounds on the performance of estimators for the distribution; performance is here measured by the probability that the estimation error exceeds a given threshold. These lower bounds are shown to approach 1/2 or even 1 in large samples, depending on the situation considered. Similar impossibility results are also obtained for the distribution of linear functions (e.g., predictors) of the post-model-selection estimator.
a b s t r a c tWe study the distributions of the LASSO, SCAD, and thresholding estimators, in finite samples and in the large-sample limit. The asymptotic distributions are derived for both the case where the estimators are tuned to perform consistent model selection and for the case where the estimators are tuned to perform conservative model selection. Our findings complement those of Knight and Fu [K. Knight, W. Fu, Asymptotics for lasso-type estimators, Annals of Statistics 28 (2000) 1356-1378] and Fan and Li [J. Fan, R. Li, Variable selection via non-concave penalized likelihood and its oracle properties, Journal of the American Statistical Association 96 (2001) 1348-1360].We show that the distributions are typically highly non-normal regardless of how the estimator is tuned, and that this property persists in large samples. The uniform convergence rate of these estimators is also obtained, and is shown to be slower than n −1/2 in case the estimator is tuned to perform consistent model selection. An impossibility result regarding estimation of the estimators' distribution function is also provided.
We give a large-sample analysis of the minimal coverage probability of the usual confidence intervals for regression parameters when the underlying model is chosen by a "conservative" (or "overconsistent") model selection procedure. We derive an upper bound for the large-sample limit minimal coverage probability of such intervals that applies to a large class of model selection procedures including the Akaike information criterion as well as various pretesting procedures. This upper bound can be used as a safeguard to identify situations where the actual coverage probability can be far below the nominal level. We illustrate that the (asymptotic) upper bound can be statistically meaningful even in rather small samples.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.