2012
DOI: 10.1117/12.908542
|View full text |Cite
|
Sign up to set email alerts
|

Automatic indexing of scanned documents: a layout-based approach

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
15
0

Year Published

2012
2012
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 28 publications
(16 citation statements)
references
References 9 publications
1
15
0
Order By: Relevance
“…In literature there are several works that addressed the problem of document structuring for user application dealing with semantic search engine. In [9] the authors propose an approach to handle automatic indexing of documents based on generic positional extraction of index terms. For this purpose is applied the knowledge of document templates stored in a common full text search index to find index positions that were successfully extracted in the past.…”
Section: Related Workmentioning
confidence: 99%
“…In literature there are several works that addressed the problem of document structuring for user application dealing with semantic search engine. In [9] the authors propose an approach to handle automatic indexing of documents based on generic positional extraction of index terms. For this purpose is applied the knowledge of document templates stored in a common full text search index to find index positions that were successfully extracted in the past.…”
Section: Related Workmentioning
confidence: 99%
“…Cesarini et al [6] learns a database of keywords for each template and fall back to a global database of keywords. Esser et al [7] uses a database of absolute positions of fields for each template. Medvet et al [8] uses a database of manually created (field, pattern, parser) triplets for each template, designs a probabilistic model for finding the most similar pattern in a template, and extracts the value with the associated parser.…”
Section: Related Workmentioning
confidence: 99%
“…A number of systems have been proposed that rely on first classifying the template, e.g. Intellix [3], ITESOFT [4], smartFIX [5] and others [6], [7], [8]. As these systems rely on having seen the template beforehand, they cannot accurately handle documents from unseen templates.…”
Section: Introductionmentioning
confidence: 99%
“…Many documents used in enterprises and governments are typically derived from templates, especially forms completed by users, e.g., tax forms, medical forms, job application forms, etc. Given a set of templates and a scanned paper document, an open problem is to quickly and accurately identify which template this scanned document was originally derived from (Esser et al, 2011). To solve this problem, a number of systems based on labeled information have been proposed and developed (Cunningham et al, 2002;T.…”
Section: Introductionmentioning
confidence: 99%
“…Studies have been performed to use image features to match a scanned document to its template (Hu et al, 2000). Some of these studies still require labeled information (Esser et al, 2011), while others require consistent high-quality data in order to function properly.…”
Section: Introductionmentioning
confidence: 99%