2020
DOI: 10.1016/j.ijmedinf.2020.104302
|View full text |Cite
|
Sign up to set email alerts
|

Automatic classification of scanned electronic health record documents

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
30
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 36 publications
(31 citation statements)
references
References 16 publications
1
30
0
Order By: Relevance
“…The cause of the error can be in the form of the characters in the letter number are illegible (text characters from OCR were not recognized correctly), data does not match the provided regular expression pattern and unpredictable. This study adapts, modifies and combines the methods in previous studies (scanned document classification with OCR-assisted text approach [17]- [19], hierarchical classification [28], CNN [20]- [22], regular expression [23]- [25] and framework Hadoop [29] which in the end this proposed method is able to overcome the problem of classifying scanned documents (using a text-based approach with the help of OCR) at a depth of 4 levels automatically in a hierarchical manner that is able to classify different document types with document conditions that have unstructured text content using CNN and have special patterns (specific and short strings) using regular expression and implementation of big data technology using Hadoop framework for store and analysis of large-scale data. This method is powerful and effective to overcome the multilevel classification problem in the case of this electronic mail document.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…The cause of the error can be in the form of the characters in the letter number are illegible (text characters from OCR were not recognized correctly), data does not match the provided regular expression pattern and unpredictable. This study adapts, modifies and combines the methods in previous studies (scanned document classification with OCR-assisted text approach [17]- [19], hierarchical classification [28], CNN [20]- [22], regular expression [23]- [25] and framework Hadoop [29] which in the end this proposed method is able to overcome the problem of classifying scanned documents (using a text-based approach with the help of OCR) at a depth of 4 levels automatically in a hierarchical manner that is able to classify different document types with document conditions that have unstructured text content using CNN and have special patterns (specific and short strings) using regular expression and implementation of big data technology using Hadoop framework for store and analysis of large-scale data. This method is powerful and effective to overcome the multilevel classification problem in the case of this electronic mail document.…”
Section: Resultsmentioning
confidence: 99%
“…Based on small trial, the accuracy performance of Google Vision OCR was the best comparing to other OCR tools [16]. In previous studies, the automatic classification of scanned electronic health record documents done by extracted text using (OCR and multiple text classification machine learning models, including both "bag of words" and deep learning approaches [17], the classifying image spam detection using OCR, machine learning and natural language processing [18] and the classifying promotion images using OCR and Naïve Bayes classifier [19]. From research [17]- [19] show that text-based classification systems can accurately classify scanned documents.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…As others have noted, the literature devoted to scanned documents and images within EHRs is smaller than we expected given the importance of this commonly used means for HIE in the early decades of EHR use in our country. 18 Our study is limited by its small size-it is a pilot-and by the population that we used which is from an academic center. The number of cancer risk factors identified in scanned records may be different in other populations.…”
Section: Discussionmentioning
confidence: 99%
“…The paper suggests that a more accurate engine must be used to recognize cursive handwriting to improve accuracy. The paper [6] proposes a system to group the clinical and nonclinical documents into suitable categories which are again subclassified. .Electronic Health Records have also known as (EHR's) contain a large number of scanned documents such as radiology reports, clinical correspondence, identification cards, etc.…”
Section: Literature Surveymentioning
confidence: 99%