We consider the problem of dimensionality reduction for prediction of a target Y ∈ R to be explained by a covariate vector X ∈ R p , with a particular focus on extreme values of Y which are of particular concern for risk management. The general purpose is to reduce the dimensionality of the statistical problem through an orthogonal projection on a lower dimensional subspace of the covariate space. Inspired by the sliced inverse regression (SIR) methods, we develop a novel framework (TIREX, Tail Inverse Regression for EXtreme response) relying on an appropriate notion of tail conditional independence in order to estimate an extreme sufficient dimension reduction (SDR) space of potentially smaller dimension than that of a classical SDR space. We prove the weak convergence of tail empirical processes involved in the estimation procedure and we illustrate the relevance of the proposed approach on simulated and real world data.
We derive sanity-check bounds for the cross-validation (CV) estimate of the generalization risk for learning algorithms dedicated to extreme or rare events. We consider classification on extreme regions of the covariate space, a problem analyzed in Jalalzai et al. 2018. The risk is then a probability of error conditional to the norm of the covariate vector exceeding a high quantile.Establishing sanity-check bounds consist in recovering bounds regarding the CV estimate that are of the same nature as the ones regarding the empirical risk. We achieve this goal both for K-fold CV with an exponential bound and for leave-p-out CV with a polynomial bound, thus extending the state-of-the-art results to the modified version of the risk which is adapted to extreme value analysis.
This paper investigates the efficiency of different cross-validation (CV) procedures under algorithmic stability with a specific focus on the K-fold. We derive a generic upper bound for the risk estimation error applicable to a wide class of CV schemes. This upper bound ensures the consistency of the leave-one-out and the leave-p-out CV but fails to control the error of the K-fold. We confirm this negative result with a lower bound on the K-fold error which does not converge to zero with the sample size. We thus propose a debiased version of the K-fold which is consistent for any uniformly stable learner. We apply our results to the problem of model selection and demonstrate empirically the usefulness of the promoted approach on real world datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.