We study a high-dimensional linear regression model in a semisupervised setting, where for many observations only the vector of covariates X is given with no responses Y . We do not make any sparsity assumptions on the vector of coefficients, nor do we assume normality of the covariates. We aim at estimating the signal level, i.e., the amount of variation in the response that can be explained by the set of covariates. We propose an estimator, which is unbiased, consistent, and asymptotically normal. This estimator can be improved by adding zero-estimators arising from the unlabeled data. Adding zero-estimators does not affect the bias and potentially can reduce the variance. We further present an algorithm based on our approach that improves any given signal level estimator. Our theoretical results are demonstrated in a simulation study.
We study a linear high-dimensional regression model in a semi-supervised setting, where for many observations only the vector of covariates X is given with no response Y . We do not make any sparsity assumptions on the vector of coefficients, and aim at estimating Var(Y |X). We propose an estimator, which is unbiased, consistent, and asymptotically normal. This estimator can be improved by adding zero-estimators arising from the unlabelled data. Adding zero-estimators does not affect the bias and potentially can reduce variance. In order to achieve optimal improvement, many zero-estimators should be used, but this raises the problem of estimating many parameters. Therefore, we introduce covariate selection algorithms that identify which zero-estimators should be used in order to improve the above estimator. We further illustrate our approach for other estimators, and present an algorithm that improves estimation for any given variance estimator. Our theoretical results are demonstrated in a simulation study.
We study a high-dimensional regression setting under the assumption of known covariate distribution. We aim at estimating the amount of explained variation in the response by the best linear function of the covariates (the signal level). In our setting, neither sparsity of the coefficient vector, nor normality of the covariates or linearity of the conditional expectation are assumed. We present an unbiased and consistent estimator and then improve it by using a zero-estimator approach, where a zero-estimator is a statistic whose expected value is zero. More generally, we present an algorithm based on the zero estimator approach that in principle can improve any given estimator. We study some asymptotic properties of the proposed estimators and demonstrate their finite sample performance in a simulation study.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.