2021
DOI: 10.1186/s12911-021-01630-7
|View full text |Cite
|
Sign up to set email alerts
|

An automated data cleaning method for Electronic Health Records by incorporating clinical knowledge

Abstract: Background The use of Electronic Health Records (EHR) data in clinical research is incredibly increasing, but the abundancy of data resources raises the challenge of data cleaning. It can save time if the data cleaning can be done automatically. In addition, the automated data cleaning tools for data in other domains often process all variables uniformly, meaning that they cannot serve well for clinical data, as there is variable-specific information that needs to be considered. This paper prop… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(12 citation statements)
references
References 20 publications
0
8
0
Order By: Relevance
“…8 Algorithms and programs have also been designed that not only detect pre-existing errors in the process of data cleaning, but also remove and correct diagnosed errors. 14,18 Semi-automatic procedures may complement automatic procedures in data gathering that can further improve the quality of the extraction process with data cleaning. 10…”
Section: Discussionmentioning
confidence: 99%
“…8 Algorithms and programs have also been designed that not only detect pre-existing errors in the process of data cleaning, but also remove and correct diagnosed errors. 14,18 Semi-automatic procedures may complement automatic procedures in data gathering that can further improve the quality of the extraction process with data cleaning. 10…”
Section: Discussionmentioning
confidence: 99%
“…In general, MIMIC and eICU-CRD may be excellent benchmark databases, but we found that "real-world" data exported directly from a hospital's IT infrastructure pose many challenges that are not present in these databases. [26] presented a medical data cleaning pipeline that explicitly addresses some of the issues that we also encountered in our research. They considered laboratory tests and similar measurements and proposed manually curated validation rules for numerical variables and an automatic strategy for harmonizing (misspelled) units of measurement through fuzzy search and variable-dependent conversion rules.…”
Section: Xsl • Fomentioning
confidence: 99%
“…They considered laboratory tests and similar measurements and proposed manually curated validation rules for numerical variables and an automatic strategy for harmonizing (misspelled) units of measurement through fuzzy search and variable-dependent conversion rules. The focus of Shi et al [26] is on improving the quality of data [27][28][29], whereas Wang et al [15], Tang et al [16], and Mandyam et al [17] are mainly concerned with transforming data into a form suitable for ML. A more detailed evaluation of FIDDLE, MIMIC-Extract, and cleaning and organization pipeline for EHR computational and analytical tasks and the approach to our data by Shi et al [26] can be found in Multimedia Appendix 1 [15][16][17]26].…”
Section: Xsl • Fomentioning
confidence: 99%
See 2 more Smart Citations