2D barcodes are taking on increasing significance as the ubiquity of high-resolution cameras, combined with the availability of variable data printing, drives increasing amounts of "click and connect" applications. Barcodes therefore serve as an increasingly significant connection between physical and electronic portions, or versions, of documents. The use of color provides many additional advantages, including increased payload density and security. In this paper, we consider four factors affecting the readable payload in a color barcode: (1) number of print-scan (PS), or copy, cycles, (2) image restoration to offset PS-induced degradation, (3) the authentication algorithm used, and (4) the use of spectral pre-compensation (SPC) to optimize the color settings for the color barcodes. The PS cycle was shown to consistently reduce payload density by approximately 55% under all tested conditions. SPC nearly doubled the payload density, and selecting the better authentication algorithm increased payload density by roughly 50% in the mean. Restoration, however, was found to increase payload density less substantially (~30%), and only when combined with the optimized settings for SPC. These results are also discussed in light of optimizing payload density for the generation of document security deterrents.
To address cost and regulatory concerns, many businesses are converting paper-based elements of their workflows into fully electronic flows that use the content of the documents. Scanning the document contents into workflows, however, is a manual, error-prone, and costly process especially when the data extraction process requires high accuracy. These manual costs are a primary barrier to widespread adoption of distributed capture solutions for business critical workflows such as insurance claims, medical records, or loan applications. Software solutions using artificial intelligence and natural language processing techniques are emerging to address these needs, but each have their individual strengths and weaknesses, and none have demonstrated a high level of accuracy across the many unstructured document types included in these business critical workflows. This paper describes how to overcome many of these limitations by intelligently combining multiple approaches for document classification using meta-algorithmic design patterns. These patterns explore the error space in multiple engines, and provide improved and "emergent" results in comparison to voting schemes and to the output of any of the individual engines. This paper considers the results of the individual engines along with traditional combinatorial techniques such as voting, before describing prototype results for a variety of novel metaalgorithmic patterns that reduce individual document error rates by up to 13% and reduce system error rates by up to 38%.
We present design strategies, implementation preferences and throughput results obtained in deploying a UI-based ground truthing engine as the last step in the quality assurance (QA) for the conversion of a large out-of-print book collection into digital form. A series of automated QA steps were first performed on the document. Five distinct zoning analysis options were deployed and the PDF output thence generated was used to regenerate TIFF files for comparison to the originals. Regenerated TIFFs failing automated QA or a separate visual QA were tagged for ground truthing. Less than 3% of the pages in a 1.2x10 6 -page corpus required ground truthing, resulting in a throughput rate of "fullyproofed" pages of 2x10 5 pages/man-week. Among the design advantages crucial for this throughput rate was the use of the identical zoning engine for the original production workflow and for the ground truthing engine.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.