The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities

Beesley, Lauren J.; Salvatore, Maxwell; Fritsche, Lars G.; Pandit, Anita; Rao, Arvind; Brummett, Chad M.; Willer, Cristen J.; Lisabeth, Lynda D.; Mukherjee, Bhramar

doi:10.1002/sim.8445

Cited by 63 publications

(57 citation statements)

References 225 publications

(438 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…An earlier smoking PheWAS estimated the effects of related biases under different strengths of simulated confounding and found that even in the scenario where confounder-smoking status association was very strong (with an odds ratio of 10), there was still no evidence of inflation in false positive rate in the ever smokers [4]. In addition, we acknowledge that misclassification may have occurred at both the level of applying ICD codes and also in the automated process of converting them to phecodes [23,24] Furthermore, MR assumes a linear effect, which would not be able to precisely capture the detrimental effects of smoking intensity if the effect is non-linear. Finally, we acknowledge that this study was carried out in participants of White-British ancestry and other studies are required to confirm these associations and their magnitude in other populations.…”

Section: Discussionmentioning

confidence: 88%

Mendelian randomization case-control PheWAS in UK Biobank shows evidence of causality for smoking intensity in 28 distinct clinical conditions

et al. 2020

View full text Add to dashboard Cite

Background Smoking is one of the greatest threats to public health worldwide. We integrated phenome-wide association study (PheWAS) and Mendelian randomization (MR) approaches to explore causal effects of genetically predicted smoking intensity across the human disease spectrum. Methods We conducted PheWAS case-control analyses in 152,483 ever smokers of White-British ancestry, aged 39–73 years. Disease diagnoses were based on hospital inpatient and mortality registrations. Smoking intensity was instrumented by four genetic variants, and disease risks estimated for one cigarette per day heavier intakes. Associations passing the FDR threshold ( p <0•0025) were assessed for causality using several complementary MR approaches. Findings Genetically instrumented smoking intensity was associated with 48 conditions, with MR supporting a possible causal effect for 28 distinct outcomes. Each cigarette smoked per day elevated the odds of respiratory diseases by 5% to 33% (nine distinct diseases, including pneumonia, emphysema, obstructive chronic bronchitis, pleurisy, pulmonary collapse, respiratory failure) and the odds of circulatory disease by 5% to 23% (seven diseases, including atherosclerosis, myocardial infarction, congestive heart failure, arterial embolisms). Further effects were seen for cancer within the respiratory system and other neoplasms, renal failure, septicaemia, and retinal disorders. No associations were observed in sensitivity analyses on 185,002 never smokers. Interpretation These genetic data demonstrate the substantial adverse health impacts by smoking intensity and suggest notable increases in the risks of several diseases. Public health initiatives should highlight the damage caused by smoking intensity and the potential benefits of reducing or ideally quitting smoking.

show abstract

Section: Discussionmentioning

confidence: 88%

Mendelian randomization case-control PheWAS in UK Biobank shows evidence of causality for smoking intensity in 28 distinct clinical conditions

et al. 2020

View full text Add to dashboard Cite

show abstract

“…Large-scale biobanks with hundreds of thousands of genotyped and deeply phenotyped subjects are valuable resources to identify genetic components of complex phenotypes. 1,2 In biobanks, ordinal categorical data is a common type of phenotype, which is often collected from surveys, questionnaires, and testing to measure human behaviors, satisfaction, and preferences. 3,4 For example, a web questionnaire was used for 182,219 UK Biobank participants to collect 150 food and other health behavior related preferences, all of which are ordinal categorical phenotypes based on a 9-point hedonic scale of liking from 1 (extremely dislike) to 9 (extremely like).…”

Section: Mainmentioning

confidence: 99%

Efficient mixed model approach for large-scale genome-wide association studies of ordinal categorical phenotypes

Zhou

Day

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

In genome-wide association studies (GWAS), ordinal categorical phenotypes are widely used to measure human behaviors, satisfaction, and preferences. However, due to the lack of analysis tools, methods designed for binary and quantitative traits have often been used inappropriately to analyze categorical phenotypes, which produces inflated type I error rates or is less powerful. To accurately model the dependence of an ordinal categorical phenotype on covariates, we propose an efficient mixed model association test, Proportional Odds Logistic Mixed Model (POLMM). POLMM is demonstrated to be computationally efficient to analyze large datasets with hundreds of thousands of genetic related samples, can control type I error rates at a stringent significance level regardless of the phenotypic distribution, and is more powerful than other alternative methods. We applied POLMM to 258 ordinal categorical phenotypes on array-genotypes and imputed samples from 408,961 individuals in UK Biobank. In total, we identified 5,885 genome-wide significant variants, of which 424 variants (7.2%) are rare variants with MAF < 0.01.

show abstract

“…As the amount of data collected on a daily basis from hospital health care system keeps increasing, [1] the appeal for leveraging the full potential of these data for research purposes and to investigate clinical questions is also becoming stronger than ever. [2][3][4][5] Yet, EHR data are quite different from research oriented data (e.g. cohort or trial data): i) they are less structured, more heterogeneous, ii) they present finer granularity, iii) data collection is done for health care purpose.…”

Section: Introductionmentioning

confidence: 99%

Automatic phenotyping of electronical health record: PheVis algorithm

Ferté

Cossin

Schaeverbeke

et al. 2021

Journal of Biomedical Informatics

View full text Add to dashboard Cite

Electronic Health Records (EHRs) often lack reliable annotation of patient medical conditions. Phenorm, an automated unsupervised algorithm to identify patient medical conditions from EHR data, has been developed.PheVis extends PheNorm at the visit resolution. PheVis combines diagnosis codes together with medical concepts extracted from medical notes, incorporating past history in a machine learning approach to provide an interpretable "white box" predictor of the occurrence probability for a given medical condition at each visit.PheVis is applied to two real-world use-cases using the datawarehouse of the University Hospital of Bordeaux: i) rheumatoid arthritis, a chronic condition; ii) tuberculosis, an acute condition (cross-validated AUROC were respectively 0.943 [0.940 ; 0.945] and 0.987 [0.983 ; 0.990]). PheVis performs well for chronic conditions, though absence of exclusion of past medical history by natural language processing tools limits its performance in French for acute conditions. It achieves significantly better performance than state-of-the-art methods especially for chronic diseases.

show abstract

The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities

Cited by 63 publications

References 225 publications

Mendelian randomization case-control PheWAS in UK Biobank shows evidence of causality for smoking intensity in 28 distinct clinical conditions

Mendelian randomization case-control PheWAS in UK Biobank shows evidence of causality for smoking intensity in 28 distinct clinical conditions

Efficient mixed model approach for large-scale genome-wide association studies of ordinal categorical phenotypes

Automatic phenotyping of electronical health record: PheVis algorithm

Contact Info

Product

Resources

About