2019 Artificial Intelligence for Transforming Business and Society (AITB) 2019
DOI: 10.1109/aitb48515.2019.8947440
|View full text |Cite
|
Sign up to set email alerts
|

One-Shot Template Matching for Automatic Document Data Capture

Abstract: In this paper, we propose a novel one-shot templatematching algorithm to automatically capture data from business documents with an aim to minimize manual data entry. Given one annotated document, our algorithm can automatically extract similar data from other documents having the same format. Based on a set of engineered visual and textual features, our method is invariant to changes in position and value. Experiments on a dataset of 595 real invoices demonstrate 86.4% accuracy.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 13 publications
0
6
0
Order By: Relevance
“…To ensure a high variance of document layouts in the dataset, unlabeled documents were clustered into layouts 5 . Only a limited number of documents per layout were selected for annotation.…”
Section: Dataset Characteristicsmentioning
confidence: 99%
See 1 more Smart Citation
“…To ensure a high variance of document layouts in the dataset, unlabeled documents were clustered into layouts 5 . Only a limited number of documents per layout were selected for annotation.…”
Section: Dataset Characteristicsmentioning
confidence: 99%
“…Information extraction approaches must handle varying layouts, semantic fields and multiple input modalities at the intersection of computer vision, natural language processing and information retrieval. While there has been progress on the task [4,7,14,15,18,19,25,34], there is no publicly available large-scale benchmark to train and compare these approaches, an issue that has been noted by several authors [5,16,24,26,29]. Existing approaches are trained on privately collected datasets, hindering their reproducibility, fair comparisons and tracking field progression [11,23,24].…”
Section: Introductionmentioning
confidence: 99%
“…One-shot principle is studied in information extraction and can be also applied without the need of any learnable parameters, for example as templates matching [7].…”
Section: Previous and Other Workmentioning
confidence: 99%
“…However, such methods often fail when a document with unseen template is encountered [22]. To improve template-based methods, in [5] an one-shot template-matching algorithm invariant to changes in position is proposed. Methods that work on unseen document formats were proposed in [19,13].…”
Section: Introductionmentioning
confidence: 99%