2020
DOI: 10.1002/sim.8524
|View full text |Cite
|
Sign up to set email alerts
|

An analytic framework for exploring sampling and observation process biases in genome and phenome‐wide association studies using electronic health records

Abstract: Large-scale association analyses based on observational health care databases such as electronic health records have been a topic of increasing interest in the scientific community. However, challenges due to nonprobability sampling and phenotype misclassification associated with the use of these data sources are often ignored in standard analyses. The extent of the bias introduced by ignoring these factors is not well-characterized. In this paper, we develop an analytic framework for characterizing the bias e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 14 publications
(10 citation statements)
references
References 27 publications
0
10
0
Order By: Relevance
“…The often lower pairwise correlations (e.g., r[PRSTC, PRSLDL] = 0.72 and r[PRSeGFR, PRScreatine] = -0.78) were expected because ExPRSs capture only a fraction of the exposure's variance (see diagonal of Figure 3C). The consistent patterns suggested that several ExPRSs can replicate correlations of measured exposures relatively well and thus might be suitable surrogates for exposures, especially for studies where measurements might not be feasible or likely be biased 50,53,54…”
Section: Correlations Of Exprss Across Exposuresmentioning
confidence: 80%
“…The often lower pairwise correlations (e.g., r[PRSTC, PRSLDL] = 0.72 and r[PRSeGFR, PRScreatine] = -0.78) were expected because ExPRSs capture only a fraction of the exposure's variance (see diagonal of Figure 3C). The consistent patterns suggested that several ExPRSs can replicate correlations of measured exposures relatively well and thus might be suitable surrogates for exposures, especially for studies where measurements might not be feasible or likely be biased 50,53,54…”
Section: Correlations Of Exprss Across Exposuresmentioning
confidence: 80%
“…Observational studies and secondary analysis of RTs are sources of information on potential covariates for predictive models. It is also increasingly common to link high-dimensional genomic data on individuals with administrative databases containing records of clinical events for those individuals (e.g., Beesley, Fritsche & Mukherjee, 2020). As discussed in Section 4.1, it is important to adjust for selection effects in the administrative data and to consider the possibility of unobserved confounders.…”
Section: Prediction and Decisionsmentioning
confidence: 99%
“…Note sgn(∆) = sgn(ρ I,Y ) by equation 3 in the appendix, which implies the measurement error adjustment either shrinks the data quality measure toward zero or reverses its sign. While prior investigations have noted the interaction between measurement error and selection bias [Beesley et al, 2020, Beesley and Mukherjee, 2019, van Smeden et al, 2019, the interaction with the sample size relative to the population, i.e., f , has largely been ignored. The above statistical decomposition clarifies the importance of this quantity f .…”
Section: Analysis Of Case-count Datamentioning
confidence: 99%