Results of a Study on Invoice-Reading Systems in Germany

Klein, Bertin; Agne, Stefan; Dengel, Andreas

doi:10.1007/978-3-540-28640-0_43

Cited by 28 publications

(15 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The performance with such generalized annotations was comparable to performance with annotations specifically made for each invoice. This approach saves significant annotation effort, since many companies receive most of their invoices from a small subset of vendors [8]. Reducing the need for human annotation further is the subject of future work.…”

Section: Discussionmentioning

confidence: 99%

“…However, despite its recognized value in business workflows, such data extraction tasks suffer from inadequate or unreliable levels of automation and are still largely done manually. The cost of manual data extraction can be quite high; for example, manually processing a single invoice can cost up to 9 Euro [8]. Large businesses may process tens of thousands of invoices per day, leading to high cost of operations.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Information extraction by finding repeated structure

Bart

Sarkar

2010

Proceedings of the 9th IAPR International Workshop on Document Analysis Systems

View full text Add to dashboard Cite

Repetition of layout structure is prevalent in document images. In document design, such repetition conveys the underlying logical and functional structure of the data. For example, in invoices, the names, unit prices, quantities and other descriptors of every line item are laid out in a consistent spatial structure. We propose a general method for extracting such repeated structure from documents. After receiving a single example of the structure to be found, the proposed method localizes additional instances of this structure in the same document and in additional documents. A wide variety of perceptually motivated cues (such as alignment and saliency) is used for this purpose. These cues are combined in a probabilistic model, and a novel algorithm for exact inference in this model is proposed and used. We demonstrate that this method can cope with complex instances of repeated structure and generalizes successfully across a wide range of structure variations.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Information extraction by finding repeated structure

Bart

Sarkar

2010

Proceedings of the 9th IAPR International Workshop on Document Analysis Systems

View full text Add to dashboard Cite

show abstract

“…There is, of course, a connection between ballot reading and automatic forms processing, a topic which has been heavily studied in our field (e.g., [11,12,13]), as well as to the scoring of standardized tests, as noted earlier. Processing paper ballots used in elections differs from these other tasks in important ways, however.…”

Section: Introductionmentioning

confidence: 94%

A Document Analysis System for Supporting Electronic Voting Research

Lopresti

Nagy

Smith

2008

2008 the Eighth IAPR International Workshop on Document Analysis Systems

View full text Add to dashboard Cite

show abstract

“…Form reading achieved commercial viability after a decade of experimentation 6,7,8,9 . Specialized algorithms were crafted to detect parallel rulings in large forms or drawings 10,11 .…”

Section: Introductionmentioning

confidence: 99%

Form similarity via Levenshtein distance between ortho-filtered logarithmic ruling-gap ratios

Nagy

Lopresti

2013

SPIE Proceedings

View full text Add to dashboard Cite

Geometric invariants are combined with edit distance to compare the ruling configuration of noisy filled-out forms. It is shown that gap-ratios used as features capture most of the ruling information of even low-resolution and poorly scanned form images, and that the edit distance is tolerant of missed and spurious rulings. No preprocessing is required and the potentially time-consuming string operations are performed on a sparse representation of the detected rulings. Based on edit distance, 158 Arabic forms are classified into 15 groups with 89% accuracy. Since the method was developed for an application that precludes public dissemination of the data, it is illustrated on public-domain death certificates.

show abstract

Results of a Study on Invoice-Reading Systems in Germany

Cited by 28 publications

References 4 publications

Information extraction by finding repeated structure

Information extraction by finding repeated structure

A Document Analysis System for Supporting Electronic Voting Research

Form similarity via Levenshtein distance between ortho-filtered logarithmic ruling-gap ratios

Contact Info

Product

Resources

About