November The offer of online, automated, and impersonal services demand users to upload scanned copies of their documents to the organisations. As a consequence of this decentralisation, the documents present more challenges to the already complex process of image processing and information extraction. To address this problem, the authors presented an optimised fully convolutional neural network model for document segmentation that works on mobile devices to detect the region of the document in the captured image. They performed experiments in three representative datasets comparing the proposed method with the Geodesic object Proposals, U-net, Mask R-CNN, and OctHU-PageScan algorithms. They also compared the proposed model with all competitors of the ICDAR2015 Competition on smartphone document capture. Furthermore, they performed a qualitative and comparative analysis with the CamScanner software, a popular app for Android and iOS smartphones used for more than 100 million users in over 200 countries. The proposed approach achieved a significant performance compared with the current state-of-the-art methods, providing a powerful approach for document segmentation in photos and scanned images.
The digital relationship between companies and customers happens through online systems where consumers must upload their identification documents pictures to prove their identities. The existence of this large volume of document images encourages the research development to generate image processing systems to automate tasks usually performed by humans, such as Document Type Classification and Document Reading. The lack of identification documents public datasets delays the research development in document image processing because researchers need to attempt partnerships with private or governmental institutions to obtain the data or build their dataset. In this context, this work presents as main contributions a system to support the automatic creation of identification document public datasets and the Brazilian Identity Document Dataset (BID Dataset): the first Brazilian identification documents public dataset. To accomplish the current personal data privacy law, all information in the BID Dataset comes from fake data. This work aims to increase the velocity of research development in identification document image processing, considering that researchers will be able to use the BID Dataset to develop their research freely.
This paper describes the short-term competition on "Components Segmentation Task of Document Photos" that was prepared in the context of the "16th International Conference on Document Analysis and Recognition" (ICDAR 2021). This competition aims to bring together researchers working on the filed of identification document image processing and provides them a suitable benchmark to compare their techniques on the component segmentation task of document images. Three challenge tasks were proposed entailing different segmentation assignments to be performed on a provided dataset. The collected data are from several types of Brazilian ID documents, whose personal information was conveniently replaced. There were 16 participants whose results obtained for some or all the three tasks show different rates for the adopted metrics, like "Dice Similarity Coefficient" ranging from 0.06 to 0.99. Different Deep Learning models were applied by the entrants with diverse strategies to achieve the best results in each of the tasks. Obtained results show that the current applied methods for solving one of the proposed tasks (document boundary detection) are already well stablished. However, for the other two challenge tasks (text zone and handwritten sign detection) research and development of more robust approaches are still required to achieve acceptable results.
ResumoO uso de técnicas de mineração de dados tem sido amplamente utilizado para o processamento de uma grande quantidade de dados documentados. No entanto, atualmente, poucos aplicativos mostraram-se efetivos para extrair e minerar dados em diários oficiais. Este trabalho tem como objetivo apresentar um método para construção de uma aplicação que usa um algoritmo para indexar conteúdo da base do Diário Oficial do Estado de Pernambuco, transformando as informações anteriormente disponíveis no texto para o formato estruturado, para aplicar uma Mineração de Dados. Para o desenvolvimento do método, a linguagem Java foi utilizada, com a possibilidade do aplicativo web. O estudo de caso baseou-se em documentos publicados no Diário Oficial de janeiro de 2007 a abril de 2017. Os resultados mostram que é possível indexar e estruturar esses dados, mas ainda há necessidade de uma melhor padronização dos dados.Palavras-Chave:Mineração de Dados; Diário Oficial; Árvore de Decisão. AbstractThe use of Data Mining techniques has been widely applied for processing a high amount of documented data. However, to date, there are very few effective applications for extracting and mining data in official journals. This work aims to present a method for the construction of an application that uses an algorithm to index contents of the base of the Official Gazette of the state of Pernambuco, transforming the information previously available in the text to structured format, to apply a Mining of Data. For the development of the method, the Java language was used, with the possibility of the web application. The case study was based on documents published in the Official . The results show that it is possible to index this data and give meaning to it, but there is still a need for a better standardization of the data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.