High-throughput Multimodal Automated Phenotyping (MAP) with Application to PheWAS

Liao, Katherine P.; Sun, Jiehuan; Cai, Tianrun; Link, Nicholas; Hong, Chuan; Huang, Jie; Huffman, Jennifer E.; Gronsbell, Jessica; Zhang, Yichi; Ho, Yuk‐Lam; Castro, Víctor M.; Gainer, Vivian S.; Murphy, Shawn N.; O’Donnell, Christopher J.; Gaziano, J. Michael; Cho, Kelly; Szolovits, Peter; Kohane, Isaac S.; Yu, Sheng

doi:10.1101/587436

Cited by 22 publications

(45 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Using the PRISM features, we PRISM features, we were able to train supervised self-learning (SSL) and transfer learning (STL) classifiers that resulted in AUC ROC of 0.97, which can be compared to the specialized computational phenotyping performances between 0.94 and 0.96 in the literature. 22 , 31 , 32 …”

Section: Resultsmentioning

confidence: 99%

Generative transfer learning for measuring plausibility of EHR diagnosis records

Estiri

Vasey

Murphy

2020

Journal of the American Medical Informatics Association

View full text Add to dashboard Cite

Objective Due to a complex set of processes involved with the recording of health information in the Electronic Health Records (EHRs), the truthfulness of EHR diagnosis records is questionable. We present a computational approach to estimate the probability that a single diagnosis record in the EHR reflects the true disease. Materials and Methods Using EHR data on 18 diseases from the Mass General Brigham (MGB) Biobank, we develop generative classifiers on a small set of disease-agnostic features from EHRs that aim to represent Patients, pRoviders, and their Interactions within the healthcare SysteM (PRISM features). Results We demonstrate that PRISM features and the generative PRISM classifiers are potent for estimating disease probabilities and exhibit generalizable and transferable distributional characteristics across diseases and patient populations. The joint probabilities we learn about diseases through the PRISM features via PRISM generative models are transferable and generalizable to multiple diseases. Discussion The Generative Transfer Learning (GTL) approach with PRISM classifiers enables the scalable validation of computable phenotypes in EHRs without the need for domain-specific knowledge about specific disease processes. Conclusion Probabilities computed from the generative PRISM classifier can enhance and accelerate applied Machine Learning research and discoveries with EHR data.

show abstract

Section: Resultsmentioning

confidence: 99%

Generative transfer learning for measuring plausibility of EHR diagnosis records

Estiri

Vasey

Murphy

2020

Journal of the American Medical Informatics Association

View full text Add to dashboard Cite

show abstract

“…For common conditions such as diabetes mellitus, including the primary International Classification of Diseases, Ninth Revision or Tenth Revision (ICD‐9/10) billing code for the condition (e.g., 250.00 for diabetes mellitus without mention of complications) and primary NLP concept alone (e.g., “diabetes”) in an algorithm can achieve relatively high PPVs (13). However, for episodic or uncommon conditions that may be discussed at only a handful of visits, such as pseudogout, additional features related to the condition may be useful.…”

Section: Methodsmentioning

confidence: 99%

Classifying Pseudogout Using Machine Learning Approaches With Electronic Health Record Data

Tedeschi

Cai

et al. 2021

Arthritis Care & Research

Self Cite

View full text Add to dashboard Cite

Objective Identifying pseudogout in large data sets is difficult due to its episodic nature and a lack of billing codes specific to this acute subtype of calcium pyrophosphate (CPP) deposition disease. The objective of this study was to evaluate a novel machine learning approach for classifying pseudogout using electronic health record (EHR) data. Methods We created an EHR data mart of patients with ≥1 relevant billing code or ≥2 natural language processing (NLP) mentions of pseudogout or chondrocalcinosis, 1991–2017. We selected 900 subjects for gold standard chart review for definite pseudogout (synovitis + synovial fluid CPP crystals), probable pseudogout (synovitis + chondrocalcinosis), or not pseudogout. We applied a topic modeling approach to identify definite/probable pseudogout. A combined algorithm included topic modeling plus manually reviewed CPP crystal results. We compared algorithm performance and cohorts identified by billing codes, the presence of CPP crystals, topic modeling, and a combined algorithm. Results Among 900 subjects, 123 (13.7%) had pseudogout by chart review (68 definite, 55 probable). Billing codes had a sensitivity of 65% and a positive predictive value (PPV) of 22% for pseudogout. The presence of CPP crystals had a sensitivity of 29% and a PPV of 92%. Without using CPP crystal results, topic modeling had a sensitivity of 29% and a PPV of 79%. The combined algorithm yielded a sensitivity of 42% and a PPV of 81%. The combined algorithm identified 50% more patients than the presence of CPP crystals; the latter captured a portion of definite pseudogout and missed probable pseudogout. Conclusion For pseudogout, an episodic disease with no specific billing code, combining NLP, machine learning methods, and synovial fluid laboratory results yielded an algorithm that significantly boosted the PPV compared to billing codes.

show abstract

“…To improve upon methods that only consider codes, machine learning tools, largely based upon NLPs, have been developed to collect more phenotypic data from data sources beyond standardized codes such as textual clinical notes, textual discharge summaries and radiology reports [1,[18][19][20][21]. Liao et al developed a multimodal automated phenotyping (MAP) algorithm to leverage both ICD codes and EMR textual narratives based on the Unified Medical Language System [18]. MAP is multimodal because it can extract entities such as ICDs, medical NLP concepts and healthcare utilization information related to a certain phenotype from both codes and free text.…”

Section: Emrs and Phenotype-genotype Association Researchmentioning

confidence: 99%

Electronic Medical Records and Machine Learning in Approaches to Drug Development

Shinozaki¹

2020

Artificial Intelligence in Oncology Drug Discovery and Development

View full text Add to dashboard Cite

Electronic medical records (EMRs) were primarily introduced as a digital health tool in hospitals to improve patient care, but over the past decade, research works have implemented EMR data in clinical trials and omics studies to increase translational potential in drug development. EMRs could help discover phenotypegenotype associations, enhance clinical trial protocols, automate adverse drug event detection and prevention, and accelerate precision medicine research. Although feasible, data mining in EMRs still faces challenges. Existing machine learning tools may help overcome these bottlenecks in EMR mining to unlock new approaches in drug development. This chapter will explore the role of EMRs in drug development while evaluating the viability and bottlenecks of their uses in data mining. This will include discussions on EMR usage in drug development while highlighting successful outcomes in oncology and exploring ML tools to complement and enhance EMR as a widely accepted drug-research source, a section on current clinical applications of EMRs, and a conclusion to summarize and imagine what a future drug research pipeline from EMR to patient treatment may look like.

show abstract

High-throughput Multimodal Automated Phenotyping (MAP) with Application to PheWAS

Cited by 22 publications

References 35 publications

Generative transfer learning for measuring plausibility of EHR diagnosis records

Generative transfer learning for measuring plausibility of EHR diagnosis records

Classifying Pseudogout Using Machine Learning Approaches With Electronic Health Record Data

Electronic Medical Records and Machine Learning in Approaches to Drug Development

Contact Info

Product

Resources

About