2014
DOI: 10.1371/journal.pone.0087555
|View full text |Cite|
|
Sign up to set email alerts
|

Redundancy-Aware Topic Modeling for Patient Record Notes

Abstract: The clinical notes in a given patient record contain much redundancy, in large part due to clinicians’ documentation habit of copying from previous notes in the record and pasting into a new note. Previous work has shown that this redundancy has a negative impact on the quality of text mining and topic modeling in particular. In this paper we describe a novel variant of Latent Dirichlet Allocation (LDA) topic modeling, Red-LDA, which takes into account the inherent redundancy of patient records when modeling c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
33
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 55 publications
(35 citation statements)
references
References 16 publications
0
33
0
Order By: Relevance
“…These models have been largely discussed for general corpora (e.g., newspaper articles), and have been developed for many uses, including word-sense disambiguation [13], topic correlation [14], learning information hierarchies [15], and tracking themes over time [16, 17]. In the biomedical domain, work has investigated the use of topic models to evaluate the impact of copy and pasted text on topic learning [18], better understanding and predicting Medical Subject Headings (MeSH) applied to PubMed articles [19], and exploring the correlation between Federal Drug Administration (FDA) research priorities and topics in research articles funded under those priorities [20]. Recently, topic models have been employed in the clinical domain in problems such as cased-based retrieval [21]; characterizing clinical concepts over time [22]; and predicting patient satisfaction [23], depression [24], infection [25], and mortality [26].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…These models have been largely discussed for general corpora (e.g., newspaper articles), and have been developed for many uses, including word-sense disambiguation [13], topic correlation [14], learning information hierarchies [15], and tracking themes over time [16, 17]. In the biomedical domain, work has investigated the use of topic models to evaluate the impact of copy and pasted text on topic learning [18], better understanding and predicting Medical Subject Headings (MeSH) applied to PubMed articles [19], and exploring the correlation between Federal Drug Administration (FDA) research priorities and topics in research articles funded under those priorities [20]. Recently, topic models have been employed in the clinical domain in problems such as cased-based retrieval [21]; characterizing clinical concepts over time [22]; and predicting patient satisfaction [23], depression [24], infection [25], and mortality [26].…”
Section: Introductionmentioning
confidence: 99%
“…A topic may then be sampled from the topic multinomial, which indexes individual topics from which words are drawn to generate documents. The inclusion of a Dirichlet prior has the benefit of mitigating overfitting, which is a limitation of PLSI [18]. …”
Section: Introductionmentioning
confidence: 99%
“…[7,8] In the clinical domain, work has investigated the use of topic models in cased-based retrieval,[1] characterizing clinical concepts over time,[9] and the impact of copy and pasted text on topic learning. [10] Topics have also been used as features in classifiers in order to predict patient satisfaction,[11] depression,[12] infection,[13] and mortality. [14]…”
Section: Introductionmentioning
confidence: 99%
“…Even though the goal of anonymization proposals is not to analyze or prioritize EMRs, the NLP techniques applied to recognize identifiers and quasi-identifiers are very close to the process of identifying entities for other purposes. Other existing research for EMR text pre-processing has demonstrated the possibility of extracting temporal expressions [32], [33]; correcting misspelled words [34]; resolving existing coreferences [35]; eliminating redundancy [36] and generating summaries [37].…”
Section: Background and Related Workmentioning
confidence: 99%