Fast document image comparison in multilingual corpus without OCR

Lin, Yuping; Li, Yingyu; Song, Yonghong; Fang, Wanliang

doi:10.1007/s00530-015-0484-3

Cited by 3 publications

(6 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In [137], dense SIFT descriptors were used. Segmentation of the document image not only into text lines but also into characters, as proposed in [139], can also be used. The results presented in this work show that the proposed method can handle images of multilingual documents with different resolutions and font sizes.…”

Section: Document Comparisonmentioning

confidence: 99%

Document image analysis and recognition: a survey

et al. 2022

View full text Add to dashboard Cite

This paper analyzes the problems of document image recognition and the existing solutions. Document recognition algorithms have been studied for quite a long time, but despite this, currently, the topic is relevant and research continues, as evidenced by a large number of associated publications and reviews. However, most of these works and reviews are devoted to individual recognition tasks. In this review, the entire set of methods, approaches, and algorithms necessary for document recognition is considered. A preliminary systematization allowed us to distinguish groups of methods for extracting information from documents of different types: single-page and multi-page, with text and handwritten contents, with a fixed template and flexible structure, and digitalized via different ways: scanning, photographing, video recording. Here, we consider methods of document recognition and analysis applied to a wide range of tasks: identification and verification of identity, due diligence, machine learning algorithms, questionnaires, and audits. The groups of methods necessary for the recognition of a single page image are examined: the classical computer vision algorithms, i.e., keypoints, local feature descriptors, Fast Hough Transforms, image binarization, and modern neural network models for document boundary detection, document classification, document structure analysis, i.e., text blocks and tables localization, extraction and recognition of the details, post-processing of recognition results. The review provides a description of publicly available experimental data packages for training and testing recognition algorithms. Methods for optimizing the performance of document image analysis and recognition methods are described.

show abstract

Section: Document Comparisonmentioning

confidence: 99%

Document image analysis and recognition: a survey

et al. 2022

View full text Add to dashboard Cite

show abstract

“…The first group of methods is used to compare texts of electronic versions of documents, so such methods are not adapted for comparing images of documents, but can detect not only structural but also semantic modifications [4]. The second group is used for the comparison of document images without character recognition, in particular for layout comparison of documents [5,6]. Disadvantages of these methods are that only the spatial content of the document is compared [6].…”

Section: Introductionmentioning

confidence: 99%

“…Disadvantages of these methods are that only the spatial content of the document is compared [6]. Testing the methods described in [5] was not carried out on public dataset.…”

Section: Introductionmentioning

confidence: 99%

Comparison of scanned administrative document images

Андреева¹,

Arlazarov²,

Slavin

et al. 2020

Twelfth International Conference on Machine Vision (ICMV 2019)

View full text Add to dashboard Cite

In this work the methods of comparison of digitized copies of administrative documents were considered. This problem arises, for example, when comparing two copies of documents signed by two parties in order to find possible modifications made by one party, in the banking sector at the conclusion of contracts in paper form. The proposed method of document image comparison is based on a combination of several ways of image comparison of words that are descriptors of text feature points. Testing was conducted on public Payslip Dataset (French). The results showed the high quality and the reliability of finding differences in two images that are versions of the same document.

show abstract

“…Specifically, this special issue targeted the most recent technical progresses on learning techniques for high-dimensional multimedia data, including classification [1], segmentation [2,4], feature selection [1,5,8], deep learning [5], image saliency detection [7], and many others [3,6], in many kinds of learning-based applications, including image processing sequences [1,2,7,8], text processing [1,4,6], system applications [6]. The topics of the special issue are interesting, so in total, this special issue have received 24 submissions from at least 20 different research departments over the world.…”

mentioning

confidence: 99%

“…The paper by Lin et al [4] proposed to compare document images in multilingual corpus, which was composed of character segmentation, feature extraction and similarity measure. The paper applied projection and self-adaptive threshold to analyze the layout and then segment the text line by horizontal projection.…”

mentioning

confidence: 99%