2017
DOI: 10.1016/j.ajhg.2017.05.014
|View full text |Cite
|
Sign up to set email alerts
|

A Fast and Accurate Algorithm to Test for Binary Phenotypes and Its Application to PheWAS

Abstract: The availability of electronic health record (EHR)-based phenotypes allows for genome-wide association analyses in thousands of traits and has great potential to enable identification of genetic variants associated with clinical phenotypes. We can interpret the phenomewide association study (PheWAS) result for a single genetic variant by observing its association across a landscape of phenotypes. Because a PheWAS can test thousands of binary phenotypes, and most of them have unbalanced or often extremely unbal… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
113
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
7

Relationship

3
4

Authors

Journals

citations
Cited by 128 publications
(132 citation statements)
references
References 40 publications
1
113
0
Order By: Relevance
“…A common goal of EHR‐based analyses is to study the associations between specific phenotypes and variants at a particular gene region or across the genome, and this analysis is often performed using linear or logistic regression or using mixed linear model association (MLMA) analysis . Firth‐corrected logistic regression may prove useful for modeling rare binary outcomes or settings in which there is strong covariate separation, and its application to PheWAS is demonstrated in Fritsche et al Recently, Dey et al proposed a fast alternative to Firth‐penalized regression to stabilize estimation for PheWAS studies using saddle‐point approximation (SPA) that is useful for handling extremely unbalanced case‐control data . These methods can be applied in many other modeling settings as well.…”
Section: Statistical Issues Related To Biobank Researchmentioning
confidence: 99%
“…A common goal of EHR‐based analyses is to study the associations between specific phenotypes and variants at a particular gene region or across the genome, and this analysis is often performed using linear or logistic regression or using mixed linear model association (MLMA) analysis . Firth‐corrected logistic regression may prove useful for modeling rare binary outcomes or settings in which there is strong covariate separation, and its application to PheWAS is demonstrated in Fritsche et al Recently, Dey et al proposed a fast alternative to Firth‐penalized regression to stabilize estimation for PheWAS studies using saddle‐point approximation (SPA) that is useful for handling extremely unbalanced case‐control data . These methods can be applied in many other modeling settings as well.…”
Section: Statistical Issues Related To Biobank Researchmentioning
confidence: 99%
“…We consider J case–control studies, where the j th study has sample size nj. Within each individual study, we follow the regression model and testing procedure described in Dey et al (). For the i th subject in the j th study, let Yifalse(jfalse)=1 or 0 denote the case–control status, Xifalse(jfalse) denote the k×1 vector of nongenetic covariates (including the intercept) and Gifalse(jfalse)=0,1,2 denote the number of minor alleles of the variant to be tested.…”
Section: Methodsmentioning
confidence: 99%
“…Recently, researchers at Geisinger Health System and the Michigan Genomics Initiative (MGI) conducted separate genome-wide PheWAS using clinical data from EHR[48,49]. Verma et al used PheWAS to investigate all common variants on the Illumina HumanCoreExome chip and clinical laboratory measures from ~12,000 European American individuals[48].…”
Section: Genome-wide Phewasmentioning
confidence: 99%
“…Subsequently, they tested the significant SNPs from the clinical lab PheWAS with 541 diagnosis codes[48]. Dey et al demonstrated the application of a new statistical method (Table 1) for PheWAS and tested ~30 million imputed SNPs with 1500 EHR based PheWAS codes[49]. Dey et al also proposed a new method for binary outcomes, called SPAtest, which is a variation of logistic regression that estimates p-values using saddlepoint approximation.…”
Section: Genome-wide Phewasmentioning
confidence: 99%
See 1 more Smart Citation