Performance bounds for criteria for model selection are developed using recent theory for sieves. The model selection criteria are based on an empirical loss or contrast function with an added penalty term motivated by empirical process theory and roughly proportional to the number of parameters needed to describe the model divided by the number of observations. Most of our examples involve density or regression estimation settings and we focus on the problem of estimating the unknown density or regression function. We show that the quadratic risk of the minimum penalized empirical contrast estimator is bounded by an index of the accuracy of the sieve. This accuracy index quantifies the trade-off among the candidate models between the approximation error and parameter dimension relative to sample size.If we choose a list of models which exhibit good approximation properties with respect to different classes of smoothness, the estimator can be simultaneously minimax rate optimal in each of those classes. This is what is usually called adaptation. The type of classes of smoothness in which one gets adaptation depends heavily on the list of models. If too many models are involved in order to get accurate approximation of many wide classes of functions simultaneously, it may happen that the estimator is only approxWork supported in part by the NSF grant ECS-9410760, and URA CNRS 1321 "Statistique et modèles aléatoires", and URA CNRS 743 "Modélisation stochastique et Statistique".A. Barron et al. imately adaptive (typically up to a slowly varying function of the sample size).We shall provide various illustrations of our method such as penalized maximum likelihood, projection or least squares estimation. The models will involve commonly used finite dimensional expansions such as piecewise polynomials with fixed or variable knots, trigonometric polynomials, wavelets, neural nets and related nonlinear expansions defined by superposition of ridge functions.
We consider the problem of estimating s 2 when s belongs to some separable Hilbert space and one observes the Gaussian process Y t = s t + σL t , for all t ∈ , where L is some Gaussian isonormal process. This framework allows us in particular to consider the classical "Gaussian sequence model" for which = l 2 * and L t = λ≥1 t λ ε λ , where ε λ λ≥1 is a sequence of i.i.d. standard normal variables. Our approach consists in considering some at most countable families of finite-dimensional linear subspaces of (the models) and then using model selection via some conveniently penalized least squares criterion to build new estimators of s 2 . We prove a general nonasymptotic risk bound which allows us to show that such penalized estimators are adaptive on a variety of collections of sets for the parameter s, depending on the family of models from which they are built. In particular, in the context of the Gaussian sequence model, a convenient choice of the family of models allows defining estimators which are adaptive over collections of hyperrectangles, ellipsoids, l p -bodies or Besov bodies. We take special care to describe the conditions under which the penalized estimator is efficient when the level of noise σ tends to zero. Our construction is an alternative to the one by Efroïmovich and Low for hyperrectangles and provides new results otherwise.
Abstract. Our purpose in this paper is to provide a general approach to model selection via penalization for Gaussian regression and to develop our point of view about this subject. The advantage and importance of model selection come from the fact that it provides a suitable approach to many different types of problems, starting from model selection per se (among a family of parametric models, which one is more suitable for the data at hand), which includes for instance variable selection in regression models, to nonparametric estimation, for which it provides a very powerful tool that allows adaptation under quite general circumstances. Our approach to model selection also provides a natural connection between the parametric and nonparametric points of view and copes naturally with the fact that a model is not necessarily true. The method is based on the penalization of a least squares criterion which can be viewed as a generalization of Mallows' C p . A large part of our efforts will be put on choosing properly the list of models and the penalty function for various estimation problems like classical variable selection or adaptive estimation for various types of l p -bodies. Introducing model selection from a nonasymptotic point of viewChoosing a proper parameter set is a difficult task in many estimation problems. A large one systematically leads to a large risk while a small one may result in the same consequence, due to unduly large bias. Both excessively complicated or oversimplified models should be avoided. The dilemna of the choice, between many possible models, of one which is adequate for the situation at hand, depending on both the unknown complexity of the true parameter to be estimated and the known amount of noise or number of observations, is often a nightmare for the statistician. The purpose of this paper is to provide a general methodology, namely model selection via penalization, for solving such problems within a unified Gaussian framework which covers many classical situations involving Gaussian variables. L. Birgé: UMR 7599 "Probabilités et modèles aléatoires
Concentration inequalities deal with deviations of functions of independent random variables from their expectation. In the last decade new tools have been introduced making it possible to establish simple and powerful inequalities. These inequalities are at the heart of the mathematical analysis of various problems in machine learning and made it possible to derive new efficient algorithms. This text attempts to summarize some of the basic tools.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.