We deal with the problem of estimating some unknown regression function involved in a regression framework with deterministic design points. For this end, we consider some collection of finite dimensional linear spaces (models) and the least-squares estimator built on a data driven selected model among this collection. This data driven choice is performed via the minimization of some penalized model selection criterion that generalizes on Mallows' C p . We provide non asymptotic risk bounds for the so-defined estimator from which we deduce adaptivity properties. Our results hold under mild moment conditions on the errors. The statement and the use of a new moment inequality for empirical processes is at the heart of the techniques involved in our approach.
Abstract. The aim of this paper is to present a new estimation procedure that can be applied in various statistical frameworks including density and regression and which leads to both robust and optimal (or nearly optimal) estimators. In density estimation, they asymptotically coincide with the celebrated maximum likelihood estimators at least when the statistical model is regular enough and contains the true density to estimate. For very general models of densities, including non-compact ones, these estimators are robust with respect to the Hellinger distance and converge at optimal rate (up to a possible logarithmic factor) in all cases we know. In the regression setting, our approach improves upon the classical least squares in many respects. In simple linear regression for example, it provides an estimation of the coefficients that are both robust to outliers and simultaneously rateoptimal (or nearly rate-optimal) for a large class of error distributions including Gaussian, Laplace, Cauchy and uniform among others.
Let $Y$ be a Gaussian vector whose components are independent with a common
unknown variance. We consider the problem of estimating the mean $\mu$ of $Y$
by model selection. More precisely, we start with a collection
$\mathcal{S}=\{S_m,m\in\mathcal{M}\}$ of linear subspaces of $\mathbb{R}^n$ and
associate to each of these the least-squares estimator of $\mu$ on $S_m$. Then,
we use a data driven penalized criterion in order to select one estimator among
these. Our first objective is to analyze the performance of estimators
associated to classical criteria such as FPE, AIC, BIC and AMDL. Our second
objective is to propose better penalties that are versatile enough to take into
account both the complexity of the collection $\mathcal{S}$ and the sample
size. Then we apply those to solve various statistical problems such as
variable selection, change point detections and signal estimation among others.
Our results are based on a nonasymptotic risk bound with respect to the
Euclidean loss for the selected estimator. Some analogous results are also
established for the Kullback loss.Comment: Published in at http://dx.doi.org/10.1214/07-AOS573 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Starting from the observation of an R^n-Gaussian vector of mean f and
covariance matrix \sigma^2 I_n (I_n is the identity matrix), we propose a
method for building a Euclidean confidence ball around f, with prescribed
probability of coverage. For each n, we describe its nonasymptotic property and
show its optimality with respect to some criteria
In this paper we propose a general methodology, based on multiple testing, for testing that the mean of a Gaussian vector in R n belongs to a convex set. We show that the test achieves its nominal level, and characterize a class of vectors over which the tests achieve a prescribed power. In the functional regression model this general methodology is applied to test some qualitative hypotheses on the regression function. For example, we test that the regression function is positive, increasing, convex, or more generally, satisfies a differential inequality. Uniform separation rates over classes of smooth functions are established and a comparison with other results in the literature is provided. A simulation study evaluates some of the procedures for testing monotonicity.
Abstract. We observe a random measure N and aim at estimating its intensity s. This statistical framework allows to deal simultaneously with the problems of estimating a density, the marginals of a multivariate distribution, the mean of a random vector with nonnegative components and the intensity of a Poisson process. Our estimation strategy is based on estimator selection. Given a family of estimators of s based on the observation of N , we propose a selection rule, based on N as well, in view of selecting among these. Little assumption is made on the collection of estimators. The procedure offers the possibility to perform model selection and also to select among estimators associated to different model selection strategies. Besides, it provides an alternative to the T -estimators as studied recently in Birgé (2006). For illustration, we consider the problems of estimation and (complete) variable selection in various regression settings.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.