Monte Carlo studies of bootstrap variability in ROC analysis with data dependency

Wu, Jin; Martín, Alvin F.; Kacker, Raghu N.

doi:10.1080/03610918.2018.1521974

Cited by 5 publications

(10 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Indeed, to reduce the bootstrap variance and ensure the computation accuracy, our prior rigorous statistical research was carried out, such as the bootstrap variability studies that took months of CPU time to determine the appropriate number of bootstrap replications, the validation study by comparing the SEs of AUC estimated using the bootstrap algorithm on large i.i.d. datasets against those computed using the well-established analytical Mann-Whitney statistic method, using the multinomial probabilities to determine which bootstrap approach in the two-layer data structure should be used, and so on [4,[10][11][17][18][19][20][21][22].…”

Section: Conclusion and Discussionmentioning

confidence: 99%

“…In our ROC analysis for decision making of classifiers, data samples of scores are over tens of thousands and have no parametric model to fit, the statistics of interest are mostly probabilities or a weighted sum of probabilities, and data dependency may be involved. Thus, to reduce the bootstrap variance and ensure the computation accuracy, the bootstrap variabilities were re-studied, which took months of CPU time, and the appropriate number of bootstrap replications B under the above circumstances was determined to be 2,000 [11,[18][19][20][21][22].…”

Section: The Number Of Bootstrap Replicationsmentioning

confidence: 99%

“…where the number of iterations N is set to be 2,000, based on our extensive studies of bootstrap variability in ROC analysis with or without data dependency for large datasets [20][21][22].…”

Section: Algorithm II (Compute Correlation Coefficient)mentioning

confidence: 99%

“…In ROC analysis on large datasets, our prior rigorous statistical research was conducted, such as the validation study to provide a sound foundation for using the bootstrap method [17], the extensive bootstrap variability studies to determine the appropriate number of bootstrap replications without and with data dependency in order to reduce the bootstrap variance and ensure the computation accuracy, which took months of CPU time [11,[18][19][20][21][22], and so on. Our data structure and the corresponding bootstrap approach has sound scientific basis.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Standard errors and significance testing in data analysis for testing classifiers

Wu¹,

Kacker²

2021

Self Cite

View full text Add to dashboard Cite

The one-classifier and two-classifier significance testing for evaluation and comparison of classifiers are conducted to investigate the statistical significance of differences and provide quantitative information in terms of the significance level, i.e., p-value, in a new ROC analysis where three score distributions and two decision thresholds are employed, and data dependency caused by multiple use of the same subjects is involved. To analyze the performance of classifiers, the standard error of the cost function is estimated using the nonparametric threesample two-layer bootstrap algorithm on a two-layer data structure constructed after dataset optimization, based on our prior rigorous statistical research in ROC analysis on large datasets with data dependency. In comparison, the positive correlation coefficient must be taken into consideration, which is computed using a synchronized resampling algorithm; otherwise, the likelihood of detecting the statistical significance of difference between the performance levels of two classifiers can be wrongly reduced.

show abstract

Section: Conclusion and Discussionmentioning

confidence: 99%

Section: The Number Of Bootstrap Replicationsmentioning

confidence: 99%

Section: Algorithm II (Compute Correlation Coefficient)mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Standard errors and significance testing in data analysis for testing classifiers

Wu¹,

Kacker²

2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…Here, a bootstrap confidence interval (1000 resamples; a random number seed of 978) was employed to generate a 95% CI of AUC. A tolerance 0.02 with 1000 replications was considered appropriate [27].…”

Section: Statistical Model Developmentmentioning

confidence: 99%

Diagnostic Accuracy with Total Adenosine Deaminase as a Biomarker for Discriminating Pleural Transudates and Exudates in a Population-Based Cohort Study

et al. 2021

View full text Add to dashboard Cite

Background. An initial step in the evaluation of patients with pleural effusion syndrome (PES) is to determine whether the pleural fluid is a transudate or an exudate. Objectives. To investigate total adenosine deaminase (ADA) as a biomarker to classify pleural transudates and exudates. Methods. An assay of total ADA in pleural fluids (P-ADA) was observed using a commercial kit in a population-based cohort study. Results. 157 pleural fluid samples were collected from untreated individuals with PES due to several causes. The cause most prevalent in transudate samples (21%, n = 33 / 157 ) was congestive heart failure (79%, 26/33) and that among exudate samples (71%, n = 124 / 157 ) was tuberculosis (28.0%, 44/124). There was no significant difference in the proportion of either sex between the transudate and exudate groups. The median values of P-ADA were significantly different ( P < 0.0001 ) between both total exudates (18.4 U/L; IQR, 9.85-41.4) and exudates without pleural tuberculosis (11.0 U/L; IQR, 7.25-19.75) and transudates (6.85; IQR, 2.67-11.26). For exudates, the AUC was 0.820 (95% CI, 0.751-0.877; P < 0.001 ), with excellent discrimination. The optimum cut-off point in the ROC curve was determined as the level that provided the maximum positive likelihood ratio (PLR; 14.64; 95% CI, 2.11-101.9) and was22.0 U/L. For transudates, the AUC was 0.8245 (95% CI, 0.7470-0.9020; P < 0.0001 ). Internal validation of the AUC after 1000 resamples was evaluated with a tolerance minor than 2%. The clinical utility was equal to 92% (95% CI, 0.84 to 0.96, P < 0.05 ).Conclusions. P-ADA is a useful biomarker for distinguishing pleural exudates from transudates.

show abstract

Rescaled bootstrap confidence intervals for the population variance in the presence of outliers or spikes in the distribution of a variable of interest

Moya

Velasco-Muñoz

Verdejo

et al. 2020

Communications in Statistics - Simulation and Computation

View full text Add to dashboard Cite

Monte Carlo studies of bootstrap variability in ROC analysis with data dependency

Cited by 5 publications

References 16 publications

Standard errors and significance testing in data analysis for testing classifiers

Standard errors and significance testing in data analysis for testing classifiers

Diagnostic Accuracy with Total Adenosine Deaminase as a Biomarker for Discriminating Pleural Transudates and Exudates in a Population-Based Cohort Study

Rescaled bootstrap confidence intervals for the population variance in the presence of outliers or spikes in the distribution of a variable of interest

Contact Info

Product

Resources

About