A Subsampled Double Bootstrap for Massive Data

Sengupta, Srijan; Volgushev, Stanislav; Shao, Xiaofeng

doi:10.1080/01621459.2015.1080709

Cited by 39 publications

(52 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We have presented the BLBB and SDBB as two data-subsetting procedures to approximate the BB. The BLBB and SDBB are analogous to the BLB (Kleiner et al, 2014) and SDB (Sengupta et al, 2016). The proposed procedures have theoretical and computational properties that are comparable to those of their frequentist counterparts.…”

Section: Discussionmentioning

confidence: 92%

“…The SDBB is the Bayesian analogue to the subsampled double bootstrap for massive data proposed by Sengupta et al (2016), which also provides an approximation of ξ{π φ (·|X n )}. In Sengupta et al (2016), the authors claim that the SDB outperforms the BLB in some scenarios with limited time budget, especially when it is only possible to run s < n/b little bootstraps. Therefore, we would expect the same phenomenon to occur with the BLBB and SDBB.…”

Section: Subsampled Double Bayesian Bootstrapmentioning

confidence: 99%

“…However, the BLBB could be outperformed by the SDBB because the little Bayesian bootstraps only consider a unique partition of the dataset and, if the computational budget is limited, only a fraction of the dataset contributes to the analysis. We refer to this fraction as sample coverage, a term we borrow from Sengupta et al (2016).…”

Section: Subsampled Double Bayesian Bootstrapmentioning

confidence: 99%

“…The technical conditions assumed for T and P 0 in Theorem 2 are similar to those assumed for the BLBB. Theorem 2 is the counterpart of Theorem 1 in Sengupta et al (2016). Figure 1 describes the Monte Carlo algorithm for running the SDBB.…”

Section: Subsampled Double Bayesian Bootstrapmentioning

confidence: 99%

See 3 more Smart Citations

Bayesian Bootstraps for Massive Data

Barrientos¹,

Peña²

2020

Bayesian Anal.

View full text Add to dashboard Cite

In this article, we present data-subsetting algorithms that allow for the approximate and scalable implementation of the Bayesian bootstrap. They are analogous to two existing algorithms in the frequentist literature: the bag of little bootstraps (Kleiner et al., 2014) and the subsampled double bootstrap (Sengupta et al., 2016). Our algorithms have appealing theoretical and computational properties that are comparable to those of their frequentist counterparts. Additionally, we provide a strategy for performing lossless inference for a class of functionals of the Bayesian bootstrap, and briefly introduce extensions to the Dirichlet Process.

show abstract

Section: Discussionmentioning

confidence: 92%

Section: Subsampled Double Bayesian Bootstrapmentioning

confidence: 99%

Section: Subsampled Double Bayesian Bootstrapmentioning

confidence: 99%

Section: Subsampled Double Bayesian Bootstrapmentioning

confidence: 99%

See 2 more Smart Citations

Bayesian Bootstraps for Massive Data

Barrientos¹,

Peña²

2020

Bayesian Anal.

View full text Add to dashboard Cite

show abstract

“…Among the techniques developed for analyzing massive datasets, there are two categories that are most important. One is the "split-and-conquer" method (Zhang, Duchi and Wainwright, 2013;Chen and Xie, 2014;Battey, et al, 2015), and the other one is the resampling-based methods (Kleiner, et al, 2014;Sengupta, Volgushev and Shao, 2016). In this paper, we consider a general class of symmetric statistics (Lai and Wang, 1993;Jing and Wang, 2010) that encompasses many commonly used statistics, for example, the U and L-statistics.…”

Section: Chapter 1 General Introductionmentioning

confidence: 99%

Topics in statistical inference for massive data and high-dimensional data

Peng¹

View full text Add to dashboard Cite

show abstract

Optimal subsampling for large‐sample quantile regression with massive data

2022

View full text Add to dashboard Cite

To balance the explosive growth of data volume and limited budgets for computational resources, one of the popular methods is downscaling the data volume by subsampling a subdataset that inherits the relevant property of the full data. As an alternative to the mean regression model, the quantile regression model has been studied extensively when the data are independent and the data scale is medium. This article focuses on quantile regression with massive data where the sample size n (greater than 106 in general) is extraordinarily large but the dimension d (smaller than 20 in general) is small. We first formulate the general subsampling procedure and establish the asymptotic property of the resultant estimator. Then, with the help of optimality criteria in experimental design, we derive two subsampling probabilities that are optimal in the sense of smallest asymptotic mean square error. Since the optimal subsampling probabilities depend on the full data estimator, we develop a two‐step optimal subsampling algorithm and study the consistency and asymptotic normality of the resultant estimator. The empirical performance of the optimal subsampling algorithm is evaluated with synthetic and real datasets.

show abstract

A Subsampled Double Bootstrap for Massive Data

Cited by 39 publications

References 40 publications

Bayesian Bootstraps for Massive Data

Bayesian Bootstraps for Massive Data

Topics in statistical inference for massive data and high-dimensional data

Optimal subsampling for large‐sample quantile regression with massive data

Contact Info

Product

Resources

About