Bootstrapping is commonly used as a tool for non-parametric statistical inference to assess the quality of estimators in variable selection models. However, for a massive dataset, the computational requirement when using bootstrapping in variable selection models (BootVS) can be crucial. In this study, we propose a novel framework using a bag of little bootstraps variable selection (BLBVS) method with a ridge hybrid procedure to assess the quality of estimators in generalized linear models with a regularized term, such as lasso and group lasso penalties. The proposed method can be easily and naturally implemented with distributed computing, and thus has significant computational advantages for massive datasets. The simulation results show that our novel BLBVS method performs excellently in both accuracy and efficiency when compared with BootVS. Real data analyses including regression on a bike sharing dataset and classification of a lending club dataset are presented to illustrate the computational superiority of BLBVS in large-scale datasets.
Bootstrap is commonly used as a tool for non-parametric statistical inference to estimate meaningful parameters in Variable Selection Models. However, for massive dataset that has exponential growth rate, the computation of Bootstrap Variable Selection (BootVS) can be a crucial issue. In this paper, we propose the method of Variable Selection with Bag of Little Bootstraps (BLBVS) on General Linear Regression and extend it to Generalized Linear Model for selecting important parameters and assessing the computation efficiency of estimators by analyzing results of multiple bootstrap sub-samples. The proposed method best suits large datasets which have parallel and distributed computing structures. To test the performance of BLBVS, we compare it with BootVS from different aspects via numerical studies. The results of simulations show our method has excellent performance. A real data analysis, Risk Forecast of Credit Cards, is also presented to illustrate the computational superiority of BLBVS on large scale datasets, and the result demonstrates the usefulness and validity of our proposed method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.