2020 IEEE 12th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environ 2020
DOI: 10.1109/hnicem51456.2020.9400000
|View full text |Cite
|
Sign up to set email alerts
|

OCR Based Document Archiving and Indexing Using PyTesseract: A Record Management System for DSWD Caraga, Philippines

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(3 citation statements)
references
References 4 publications
0
2
0
Order By: Relevance
“…This preprocessing performs noise reduction, contrast enhancement, and resizing tasks to ensure optimal recognition accuracy. Following preprocessing, it uses the deep learning model, which identifies characters, words, the spatial position of the characters, and even complex layouts within the image [35]. Once the text is extracted from the image, postprocessing techniques are applied to enhance the accuracy of the recognized text, such as spell-checking and formatting correction.…”
Section: Optical Character Recognitionmentioning
confidence: 99%
“…This preprocessing performs noise reduction, contrast enhancement, and resizing tasks to ensure optimal recognition accuracy. Following preprocessing, it uses the deep learning model, which identifies characters, words, the spatial position of the characters, and even complex layouts within the image [35]. Once the text is extracted from the image, postprocessing techniques are applied to enhance the accuracy of the recognized text, such as spell-checking and formatting correction.…”
Section: Optical Character Recognitionmentioning
confidence: 99%
“…For this, we used Pythontesseract, an Optical Character Recognition (OCR) tool for Python. This tool recognizes and extracts text embedded in images and is a wrapper for Google's Tesseract-OCR Engine (Jayoma et al, 2020). Through its use, a second JSON is generated containing the extracted words and the frame number where the words were identified.…”
Section: Feature Extractionmentioning
confidence: 99%
“…Python is the engine that used the PyTesseract library and it is one of the important libraries that are used for Arabic OCR, python is open source and it's easy to implement all the python libraries. Tesseract-OCR Engine is also used to detect the text in images such as line, word and character detection [3]. The optical character recognition to converts the images to the text editable with the.txt extension, then edit.py file is created to be compared between the predicate text and the truth text to check the accuracy of Tesseract OCR for recognizing the characters, that run the edit.py file by cmd command to check the accuracy and how many characters are recognized wrong and it will account the error of recognizing and issuing the final accuracy the performance of the accuracy is 99.58% accuracy.…”
Section: Introductionmentioning
confidence: 99%