Abstract:Abstract. Companies order, receive, and pay for goods. Hence they continually receive and process invoices. For the most part these are printed on paper and are dealt with manually, so that each invoice after receipt involves processing costs of about 9 Euro on average. Often, human searching and typing of data into computer forms is required to transfer the information from paper into the computer, e.g. into ERP-systems, like SAP, that many companies run. This article presents the main results of our 300-page… Show more
“…The performance with such generalized annotations was comparable to performance with annotations specifically made for each invoice. This approach saves significant annotation effort, since many companies receive most of their invoices from a small subset of vendors [8]. Reducing the need for human annotation further is the subject of future work.…”
Section: Discussionmentioning
confidence: 99%
“…However, despite its recognized value in business workflows, such data extraction tasks suffer from inadequate or unreliable levels of automation and are still largely done manually. The cost of manual data extraction can be quite high; for example, manually processing a single invoice can cost up to 9 Euro [8]. Large businesses may process tens of thousands of invoices per day, leading to high cost of operations.…”
Repetition of layout structure is prevalent in document images. In document design, such repetition conveys the underlying logical and functional structure of the data. For example, in invoices, the names, unit prices, quantities and other descriptors of every line item are laid out in a consistent spatial structure. We propose a general method for extracting such repeated structure from documents. After receiving a single example of the structure to be found, the proposed method localizes additional instances of this structure in the same document and in additional documents. A wide variety of perceptually motivated cues (such as alignment and saliency) is used for this purpose. These cues are combined in a probabilistic model, and a novel algorithm for exact inference in this model is proposed and used. We demonstrate that this method can cope with complex instances of repeated structure and generalizes successfully across a wide range of structure variations.
“…The performance with such generalized annotations was comparable to performance with annotations specifically made for each invoice. This approach saves significant annotation effort, since many companies receive most of their invoices from a small subset of vendors [8]. Reducing the need for human annotation further is the subject of future work.…”
Section: Discussionmentioning
confidence: 99%
“…However, despite its recognized value in business workflows, such data extraction tasks suffer from inadequate or unreliable levels of automation and are still largely done manually. The cost of manual data extraction can be quite high; for example, manually processing a single invoice can cost up to 9 Euro [8]. Large businesses may process tens of thousands of invoices per day, leading to high cost of operations.…”
Repetition of layout structure is prevalent in document images. In document design, such repetition conveys the underlying logical and functional structure of the data. For example, in invoices, the names, unit prices, quantities and other descriptors of every line item are laid out in a consistent spatial structure. We propose a general method for extracting such repeated structure from documents. After receiving a single example of the structure to be found, the proposed method localizes additional instances of this structure in the same document and in additional documents. A wide variety of perceptually motivated cues (such as alignment and saliency) is used for this purpose. These cues are combined in a probabilistic model, and a novel algorithm for exact inference in this model is proposed and used. We demonstrate that this method can cope with complex instances of repeated structure and generalizes successfully across a wide range of structure variations.
“…There is, of course, a connection between ballot reading and automatic forms processing, a topic which has been heavily studied in our field (e.g., [11,12,13]), as well as to the scoring of standardized tests, as noted earlier. Processing paper ballots used in elections differs from these other tasks in important ways, however.…”
“…Form reading achieved commercial viability after a decade of experimentation 6,7,8,9 . Specialized algorithms were crafted to detect parallel rulings in large forms or drawings 10,11 .…”
Geometric invariants are combined with edit distance to compare the ruling configuration of noisy filled-out forms. It is shown that gap-ratios used as features capture most of the ruling information of even low-resolution and poorly scanned form images, and that the edit distance is tolerant of missed and spurious rulings. No preprocessing is required and the potentially time-consuming string operations are performed on a sparse representation of the detected rulings. Based on edit distance, 158 Arabic forms are classified into 15 groups with 89% accuracy. Since the method was developed for an application that precludes public dissemination of the data, it is illustrated on public-domain death certificates.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.