2019
DOI: 10.1101/2019.12.26.19015859
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Statistical inference for association studies using electronic health records: handling both selection bias and outcome misclassification

Abstract: Health research using electronic health records (EHR) has gained popularity, but misclassification of EHR-derived disease status and lack of representativeness of the study sample can result in substantial bias in effect estimates and can impact power and type I error. In this paper, we develop new strategies for handling disease status misclassification and selection bias in EHR-based association studies. We first focus on each type of bias separately. For misclassification, we propose three novel likelihood-… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

1
58
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
2

Relationship

4
2

Authors

Journals

citations
Cited by 16 publications
(59 citation statements)
references
References 21 publications
1
58
0
Order By: Relevance
“…The sample tested is, therefore, not representative of the population intended to be analyzed, and predicted case-counts and estimated parameters can deviate from the truth. While it may be possible to estimate the underlying selection model or run sensitivity analyses, it is difficult to prove that bias has been reduced or eliminated [10,5]. In the context of COVID-19 testing, such biases will continue to impact estimates of disease prevalence and the effective reproduction number until a random sample of the population is tested and/or testing becomes abundantly available.…”
Section: Introductionmentioning
confidence: 99%
“…The sample tested is, therefore, not representative of the population intended to be analyzed, and predicted case-counts and estimated parameters can deviate from the truth. While it may be possible to estimate the underlying selection model or run sensitivity analyses, it is difficult to prove that bias has been reduced or eliminated [10,5]. In the context of COVID-19 testing, such biases will continue to impact estimates of disease prevalence and the effective reproduction number until a random sample of the population is tested and/or testing becomes abundantly available.…”
Section: Introductionmentioning
confidence: 99%
“…External summary statistics or data are then used to address selection bias through weighting. Beesley and Mukherjee (2020) demonstrates good bias reduction and inferential performance of these methods when variables related to selection (collectively, denoted W ) and phenotype misclassification (denoted X) are known and observed. In reality, drivers of selection and misclassification may not be known, and known drivers may not always be observed (e.g., income, residential information, access to health care).…”
Section: Introductionmentioning
confidence: 98%
“…However, these works do not address how to account for both sources of bias in a single data analysis. Recently, Beesley and Mukherjee (2020) proposed novel strategies for addressing these two sources of bias simultaneously. The work by Beesley and Mukherjee has several new features.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations