A Complete Optical Character Recognition Methodology for Historical Documents

Vamvakas, Georgios; Gatos, Basilis; Stamatopoulos, Nikolaos; Perantonis, Stavros

doi:10.1109/das.2008.73

Cited by 54 publications

(35 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The first was the test set of the ICDAR 2007 Handwriting segmentation competition [1] while the second was a set of Greek historical typewritten documents [2].…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

A Novel Two Stage Evaluation Methodology for Word Segmentation Techniques

Louloudis

Stamatopoulos

Gatos

2009

2009 10th International Conference on Document Analysis and Recognition

View full text Add to dashboard Cite

Word segmentation is a critical stage towards word and character recognition as well as word spotting and mainly concerns two basic aspects, distance computation and gap classification. In this paper, we propose a robust evaluation methodology that treats the distance computation and the gap classification stages independently. The detection rate calculated for every distance metric corresponds to the maximum detection rate that we could have achieved if we had a perfect classifier for the gap classification stage. The proposed evaluation framework has been applied to several state-of-the-art techniques using a handwritten as well as a historical typewritten document set. The best combination of distance metric computation and gap classification state-of-the-art techniques is proposed.

show abstract

“…The first was the test set of the ICDAR 2007 Handwriting segmentation competition [1] while the second was a set of Greek historical typewritten documents [2].…”

Section: Resultsmentioning

confidence: 99%

“…The proposed evaluation framework has been applied to several state-of-the-art techniques using two different document image sets. The two sets comprise a) the test set of the ICDAR2007 handwriting segmentation competition [1] and b) a set of Greek historical typewritten documents [2].…”

Section: Introductionmentioning

confidence: 99%

A Novel Two Stage Evaluation Methodology for Word Segmentation Techniques

Louloudis

Stamatopoulos

Gatos

2009

2009 10th International Conference on Document Analysis and Recognition

View full text Add to dashboard Cite

show abstract

“…Kohonen algorithm that is one of Artificial neural network The experiments also demonstrated that system complexity can be reduced significantly without degrading performance by considering two-layered neural network rather than multiple layered neural networks [14]. In this paper [15] a complete OCR methodology for recognizing historical documents, either printed or handwritten without any knowledge of the font, is presented. The pre-processing and segmentation approach is used in order to detect text lines, words, and characters.…”

Section: Methodsmentioning

confidence: 99%

A Review on Optical Character Recognition Techniques

Modi¹,

Parikh²

2017

IJCA

View full text Add to dashboard Cite

At present scenario, there is growing demand for the software system to recognize characters in a computer system when information is scanned through paper documents. This paper presents detailed review in the field of Optical Character Recognition. Various techniques are determined that have been proposed to realize the center of character recognition in an optical character recognition system. OCR (Optical Character Recognition) translates images of typewritten or handwritten characters into the electronically editable format and it preserves font properties. Different techniques for preprocessing and segmentation have been surveyed and discussed in this paper. General TermsPattern Matching.

show abstract

“…In most cases, historical document recognition systems produce a recognition result that is evaluated in terms of character accuracy at the levels of 90% -95% [15,19]. One of the reasons for this is the fact that several errors are introduced during the segmentation phase of historical documents.…”

Section: Introductionmentioning

confidence: 99%

“…In [4] an open-source programming framework is introduced for building systems that extract information from digitized historical documents empowering the document experts themselves to develop systems with reduced effort. In [19], a complete OCR methodology for recognizing historical documents, either printed or handwritten without any knowledge of the font, is presented. It consists of a pre-processing step, a top-down segmentation step as well as a clustering scheme in order to group characters of similar shape.…”

Section: Introductionmentioning

confidence: 99%

A comprehensive evaluation methodology for noisy historical document recognition techniques

Stamatopoulos

Louloudis

Gatos

2009

Proceedings of the Third Workshop on Analytics for Noisy Unstructured Text Data

View full text Add to dashboard Cite

In this paper, we propose a new comprehensive methodology in order to evaluate the performance of noisy historical document recognition techniques. We aim to evaluate not only the final noisy recognition result but also the main intermediate stages of text line, word and character segmentation. For this purpose, we efficiently create the text line, word and character segmentation ground truth guided by the transcription of the historical documents. The proposed methodology consists of (i) a semiautomatic procedure in order to detect the text line, word and character segmentation ground truth regions making use of the correct document transcription, (ii) calculation of proper evaluation metrics in order to measure the performance of the final OCR result as well as of the intermediate segmentation stages. The semi-automatic procedure for detecting the ground truth regions has been evaluated and proved efficient and time saving. Experimental results prove that using the proposed technique, the percentage of time saved for the text line, word and character segmentation ground truth creation is more than 90%. An analytic experiment using a commercial OCR engine applied to a historical book is also presented.

show abstract

A Complete Optical Character Recognition Methodology for Historical Documents

Abstract: In this paper a complete OCR

Cited by 54 publications

References 18 publications

A Novel Two Stage Evaluation Methodology for Word Segmentation Techniques

A Novel Two Stage Evaluation Methodology for Word Segmentation Techniques

A Review on Optical Character Recognition Techniques

A comprehensive evaluation methodology for noisy historical document recognition techniques

Contact Info

Product

Resources

About