2022
DOI: 10.1093/jamia/ocac008
|View full text |Cite|
|
Sign up to set email alerts
|

A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history

Abstract: Objective To facilitate patient disease subset and risk factor identification by constructing a pipeline which is generalizable, provides easily interpretable results, and allows replication by overcoming electronic health records (EHRs) batch effects. Material and Methods We used 1872 billing codes in EHRs of 102 880 patients from 12 healthcare systems. Using tools borrowed from single-cell omics, we mitigated center-specifi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 27 publications
0
6
0
Order By: Relevance
“…60 Importantly, the use of transactional data for causal inference has facilitated the discovery of disease subgroups and analysis of treatment effect heterogeneity among these groups, a feature which is invaluable to the concept of precision medicine. [61][62][63][64][65] Furthermore, EHR provide a multifaceted trove of longitudinal and temporally rich data. 60 This temporal granularity is of paramount importance for causal inference, as it enables the tracking of individual patients over time, capturing dynamic changes in exposures, interventions, and outcomes.…”
Section: Opportunitiesmentioning
confidence: 99%
“…60 Importantly, the use of transactional data for causal inference has facilitated the discovery of disease subgroups and analysis of treatment effect heterogeneity among these groups, a feature which is invaluable to the concept of precision medicine. [61][62][63][64][65] Furthermore, EHR provide a multifaceted trove of longitudinal and temporally rich data. 60 This temporal granularity is of paramount importance for causal inference, as it enables the tracking of individual patients over time, capturing dynamic changes in exposures, interventions, and outcomes.…”
Section: Opportunitiesmentioning
confidence: 99%
“…clinical trials. [28][29][30] The identification of homogeneous disease subsets and trajectories within these large datasets can support research to disease aetiology and optimise treatment, particularly in the setting of complex heterogeneous diseases. Whether a model is trained in a supervised or unsupervised manner, accurate and generalisable results are important.…”
Section: Reviewmentioning
confidence: 99%
“…Unsupervised pattern recognition analyses identify subgroups of patient-patient similarity in a high dimensional or graph-based space. In rheumatology, they are most commonly employed for biological studies for instance to differentiate cell types in high-dimensional typing of blood and synovial biopsies, and are increasingly applied to clinical data from observational studies and post-hoc analyses of clinical trials 28–30. The identification of homogeneous disease subsets and trajectories within these large datasets can support research to disease aetiology and optimise treatment, particularly in the setting of complex heterogeneous diseases.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In contrast to the previously discussed ML approaches, unsupervised learning is used for phenotype discovery, including identification of subphenotypes, [39,74,[120][121][122][123][124][125][126][127][128] co-occurring conditions, [69,129] and disease progression patterns. [68,[130][131][132][133][134] Among the 19 articles utilizing unsupervised learning, Latent Dirichlet Allocation (LDA) [69,124,125,127,133] and K-means were the most frequently used methods.…”
Section: Unsupervised Learningmentioning
confidence: 99%