Anais Estendidos Da Conference on Graphics, Patterns and Images (SIBRAPI Estendido 2020) 2020
DOI: 10.5753/sibgrapi.est.2020.12997
|View full text |Cite
|
Sign up to set email alerts
|

BID Dataset: a challenge dataset for document processing tasks

Abstract: The digital relationship between companies and customers happens through online systems where consumers must upload their identification documents pictures to prove their identities. The existence of this large volume of document images encourages the research development to generate image processing systems to automate tasks usually performed by humans, such as Document Type Classification and Document Reading. The lack of identification documents public datasets delays the research development in document im… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(8 citation statements)
references
References 14 publications
0
5
0
Order By: Relevance
“…These documents are usually issued by governments, have strict design and security features, and their main goal is to define, verify and prove the holder's identity. The scope of usage of automatic system for identity document analysis include simplification and automatization of data entry when filling official forms [1], remote person identification [2], remote age checking [3], Know Your Customer / Anti Money Laundering (KYC / AML) procedures [4], and provision of governmental, financial, and other services.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…These documents are usually issued by governments, have strict design and security features, and their main goal is to define, verify and prove the holder's identity. The scope of usage of automatic system for identity document analysis include simplification and automatization of data entry when filling official forms [1], remote person identification [2], remote age checking [3], Know Your Customer / Anti Money Laundering (KYC / AML) procedures [4], and provision of governmental, financial, and other services.…”
Section: Introductionmentioning
confidence: 99%
“…As was mentioned in the previous sections, since identity documents by their nature contain sensitive information, there are very few publicly available datasets of identity document images, and those which exist contain either partial information, or contain synthetic examples of ungenuine documents. Existing datasets dedicated specifically to identity document images include LRDE Identity Document Image Database (LRDE IDID) [7], the recently published Brazilian Identity Document Dataset (BID Dataset) [4], and the Mobile Identity Document Video dataset family (MIDV) [8,9], to which the dataset presented in this paper also belongs. Some larger datasets, dedicated to address the issues of a broader document analysis problem, such as the ones from SmartDoc family [10], also contain identity document images.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Magee et al [33] explored the potential application of the Meijering filter [34] to the domain of recaptured identity document detection. The authors create a new dataset of [25] Spanish national ✗ Berenguel et al [26] Spanish national ✗ Gonzalez et al [1] Chilean national ✗ Polevoy et al [8] Various national ✓ Mudgalgundurao et al [29] German national and residence permits ✗ Chen et al [31] University student ✓ Benalcazar et al [9] Chilean national ✗ Magee et al [33] Brazilian national ✗ recaptured images based on the publicly available BID [35] dataset and use it to train an SVM classifier on the raw histogram data obtained by using the filter. Although their system does not compare well with approaches that utilize neural networks, it remains an attractive alternative due to being transparent and explainable.…”
Section: B Fake Id Detectionmentioning
confidence: 99%
“…To evaluate the newly published algorithms, the traditional open datasets such as PRImA [80] (document structure analysis), COCO-text [81] (text detection and recognition from natural images), and the datasets of the international project for the development of document analysis systems MAURDOR [82] are used. In addition to them, international teams create new datasets reflecting the specifics and characteristics of individual document types, for example, the BID [83] and MIDV-500 [1] datasets for the analysis of identity documents.…”
Section: Document Structure Analysis Algorithmsmentioning
confidence: 99%