The administration activity in an institute is largerly done by using a paper based mailing and document as a media. Therefore, a great effort needs to be performed in the case of management and archiving, in the form of providing storage space through the categorizing system. Digitalization of document by scanning it into a digital image is one of the solution to reduce the effort to perform the work of archiving and categorizing such document. It also provide searching feature in the form of metadata, that is manually written during the digitalization process. The metadata can contains the title of document, summary, or category. The needs to manually input this metadata can be solved by utilizing Optical Character Recognition (OCR) that converts any text in the document into readable text storing in the database system. This research focused on the implementation of the OCR system to extract text in the scanned document image and performing optimization of the pre-processing stage which is Image Thresholding. The aim of the optimization is to increase OCR accuracy by tuning threshold value of given value sets, and resulting 0.6 as the best thresholding value. Experiment performed by processing text extraction towards several scanned document and achieving accuration rate of 92.568%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.