2019
DOI: 10.1080/01621459.2019.1686985
|View full text |Cite
|
Sign up to set email alerts
|

Bayesian Double Feature Allocation for Phenotyping With Electronic Health Records

Abstract: We propose a categorical matrix factorization method to infer latent diseases from electronic health records (EHR) data in an unsupervised manner. A latent disease is defined as an unknown biological aberration that causes a set of common symptoms for a group of patients. The proposed approach is based on a novel double feature allocation model which simultaneously allocates features to the rows and the columns of a categorical matrix. Using a Bayesian approach, available prior information on known diseases gr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
21
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 15 publications
(22 citation statements)
references
References 43 publications
(36 reference statements)
1
21
0
Order By: Relevance
“…Latent diseases X2 and X3 are associated with the same set of symptoms but with opposite signs. This interesting result is also found in Ni et al (2019) where X2 and X3 were identified as polycythemia and anemia, respectively. Each of X5, X6 and X8 also finds good correspondence in Ni et al (2019) as bacterial infection, viral infection and thrombocytopenia.…”
Section: Electronic Health Records Phenotypingsupporting
confidence: 55%
See 4 more Smart Citations
“…Latent diseases X2 and X3 are associated with the same set of symptoms but with opposite signs. This interesting result is also found in Ni et al (2019) where X2 and X3 were identified as polycythemia and anemia, respectively. Each of X5, X6 and X8 also finds good correspondence in Ni et al (2019) as bacterial infection, viral infection and thrombocytopenia.…”
Section: Electronic Health Records Phenotypingsupporting
confidence: 55%
“…Unlike MNIST or the application to tumor heterogeneity, there is no ground truth or alternative implementation for posterior inference for the full data. Instead we compare the results with previous results by Ni et al (2019) who used a full MCMC implementation for a subset of 1000 patients from the same dataset. Some of our findings are consistent with the earlier results, which suggests a good approximation of the proposed CMC to full MCMC.…”
Section: Electronic Health Records Phenotypingmentioning
confidence: 96%
See 3 more Smart Citations