2017
DOI: 10.1002/sim.7543
|View full text |Cite
|
Sign up to set email alerts
|

A novel case‐control subsampling approach for rapid model exploration of large clustered binary data

Abstract: In many settings, an analysis goal is the identification of a factor, or set of factors associated with an event or outcome. Often, these associations are then used for inference and prediction. Unfortunately, in the big data era, the model building and exploration phases of analysis can be time-consuming, especially if constrained by computing power (ie, a typical corporate workstation). To speed up this model development, we propose a novel subsampling scheme to enable rapid model exploration of clustered bi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
16
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(16 citation statements)
references
References 26 publications
0
16
0
Order By: Relevance
“…patients for whom Y ki =0) and n 1 k cases (i.e. patients for whom Y ki =1) from the N 0 k non‐cases and N 1 k cases respectively in the k th hospital (Wright et al ., ; Haneuse and Rivera‐Rodriguez, ). To realize the gains of an ODS design, we must resolve the fact that the individuals who have ‘complete’ data are no longer representative of the underlying population.…”
Section: Outcome‐dependent Samplingmentioning
confidence: 99%
See 3 more Smart Citations
“…patients for whom Y ki =0) and n 1 k cases (i.e. patients for whom Y ki =1) from the N 0 k non‐cases and N 1 k cases respectively in the k th hospital (Wright et al ., ; Haneuse and Rivera‐Rodriguez, ). To realize the gains of an ODS design, we must resolve the fact that the individuals who have ‘complete’ data are no longer representative of the underlying population.…”
Section: Outcome‐dependent Samplingmentioning
confidence: 99%
“…As indicated in Section 1, Wright et al . () also considered estimation and inference for a GLMM based on data collected via a CSCC sampling scheme. Briefly, let ξ k = log{Pfalse(Ski=1false|Yki=1false)/Pfalse(Ski=1false|-0.166667emYki=0false)}.…”
Section: Outcome‐dependent Samplingmentioning
confidence: 99%
See 2 more Smart Citations
“…What was particularly interesting in the Wright et al. () analysis was that they did a clustered analysis, treating donor center, as the cluster. With a very simple adjustment via inclusion of an offset, models could be easily fit using R functions glm or gam using only a desktop computer.…”
Section: Extensionsmentioning
confidence: 99%