2021
DOI: 10.1002/acr2.11211
|View full text |Cite
|
Sign up to set email alerts
|

Text Mining of Electronic Health Records Can Accurately Identify and Characterize Patients With Systemic Lupus Erythematosus

Abstract: Objective Electronic health records (EHR) are increasingly being recognized as a major source of data reusable for medical research and quality monitoring, although patient identification and assessment of symptoms (characterization) remain challenging, especially in complex diseases such as systemic lupus erythematosus (SLE). Current coding systems are unable to assess information recorded in the physician’s free‐text notes. This study shows that text mining can be used as a reliable alternative. Methods In a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 20 publications
(26 reference statements)
0
5
0
Order By: Relevance
“…In these studies, Labrosse [ 9 ], Brunekreef [ 11 ] and Mull [ 12 ] used a string character or rule-based text mining algorithm that was very similar to ours and resulted in analogous results. The accuracy of SLE detection was very similar to ours: 71% had a complete agreement in diagnosis in a validation sample of 100 randomly selected patients [ 11 ]. Labrosse et al [ 9 ] manually reviewed all records, and their algorithm was superior to detecting pregnancy (35 of 36) compared to manual EHR assessment (30 of 36).…”
Section: Discussionmentioning
confidence: 98%
See 1 more Smart Citation
“…In these studies, Labrosse [ 9 ], Brunekreef [ 11 ] and Mull [ 12 ] used a string character or rule-based text mining algorithm that was very similar to ours and resulted in analogous results. The accuracy of SLE detection was very similar to ours: 71% had a complete agreement in diagnosis in a validation sample of 100 randomly selected patients [ 11 ]. Labrosse et al [ 9 ] manually reviewed all records, and their algorithm was superior to detecting pregnancy (35 of 36) compared to manual EHR assessment (30 of 36).…”
Section: Discussionmentioning
confidence: 98%
“…In contrast, text mining of several dichotomous disease states has been previously attempted, such as for pregnancy status in a sample of 344 patients [ 9 ], the presence of colorectal cancer in a sample of 1,262,671 patient reports and pathology notes [ 10 ], systematic lupus erythematosus (SLE) in a sample of 4,607 patients [ 11 ], and cardiac implantable device infections in a sample of 19,212 implant procedure patients records [ 12 ]. In these studies, Labrosse [ 9 ], Brunekreef [ 11 ] and Mull [ 12 ] used a string character or rule-based text mining algorithm that was very similar to ours and resulted in analogous results. The accuracy of SLE detection was very similar to ours: 71% had a complete agreement in diagnosis in a validation sample of 100 randomly selected patients [ 11 ].…”
Section: Discussionmentioning
confidence: 99%
“…However, our text mining algorithm performs very well with a very high sensitivity and specificity in detecting patients with SLE (>90%), which should be adequate in a large cohort like this. 10 Ideally, the results of this study should be validated in patients meeting the classification criteria for SLE, rather than patients with a clinical diagnosis of SLE. Our study did not include the use of immunosuppressive medication in the analysis, although some types of medication, such as belimumab and rituximab, are known to be able to influence levels of autoantibodies.…”
Section: Discussionmentioning
confidence: 99%
“…Details of the text mining algorithm used in this study have been described in detail. 10 In short, the text mining algorithm searches for pre-defined key words indicating the presence of a certain diagnosis or symptom in the written documents. Multiple diagnoses could be assigned to a single patient at the same time.…”
Section: Methodsmentioning
confidence: 99%
“…On the other hand, most of the content in charts is highly redundant and useful information can be buried under duplicated notes [19]. With the advances in data extraction and mining [20][21][22], a growing body of literature uses various natural language processing techniques to extract diagnostic information [23][24][25][26][27][28]. While these models show high flexibility and adaptability, they tend to be disease-specific, which limits their scalability.…”
Section: Introductionmentioning
confidence: 99%