Everyday millions of files are generated worldwide containing humongous amounts of data in an unstructured format. Most of us come across at least one new document every week, which tells the large volume of data associated with documents. All the data in these documents is in unstructured format which makes it difficult for further processing. The extraction of data from this documents still remains largely a manual effort resulting in higher processing time. A system that could extract the required fields from documents and store them in a structured format automatically will be of much significance. In this paper, we have described an approach for extracting the data from Exam Result Gazette document and then storing it in a CSV file. Mask RCNN model having a backbone of ResNeXt-101-32x8d and Feature Pyramid Network(FPN) has been hypertuned for detecting the required fields. Then PyTesseract Optical Character Recognition System has been used for extracting the data from detected fields. Our proposed system is trained on custom data set created by us and then evaluated on test data to extract the required fields. The overall accuracy of our system is 98.69%. The results indicate that the system could be used for efficiently extracting the required fields from given exam result gazette document.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.