2009 XXII Brazilian Symposium on Computer Graphics and Image Processing 2009
DOI: 10.1109/sibgrapi.2009.40
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Discrimination between Printed and Handwritten Text in Documents

Abstract: Recognition techniques for printed and handwritten text in scanned documents are significantly different. In this paper we address the problem of identifying each type. We can list at least four steps: digitalization, preprocessing, feature extraction and decision or classification. A new aspect of our approach is the use of data mining techniques on the decision step. A new set of features extracted of each word is proposed as well. Classification rules are mining and used to discern printed text from handwri… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2013
2013
2019
2019

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 28 publications
(12 citation statements)
references
References 21 publications
0
12
0
Order By: Relevance
“…The noise in the data resulted in binarisation failure and missing data in Figure 7-d, while the overlapping line created an occluded segment in Figure 7-c. On the other hand, although the handwritten segments shown in Figure 7-e to -h have close regularities with machine-printed text, they are correctly classified as handwritten since none of the gallery characters can produce a matching score higher than the threshold T . Because of the high similarity with the machine-printed sample, the geometric features, such as area and rectangularity used in [6], [7], [10], [12], fail to create separable classes and the samples will wrongly be classified as machine-printed. Table 1 shows the comparison of our approach with the results provided by Zagoris et al [3], [4], in which 15% of the samples in the PRImA-NHM dataset are utilised for train and the remaining 85% for the test phase.…”
Section: Gallery Creation and The Hmc Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…The noise in the data resulted in binarisation failure and missing data in Figure 7-d, while the overlapping line created an occluded segment in Figure 7-c. On the other hand, although the handwritten segments shown in Figure 7-e to -h have close regularities with machine-printed text, they are correctly classified as handwritten since none of the gallery characters can produce a matching score higher than the threshold T . Because of the high similarity with the machine-printed sample, the geometric features, such as area and rectangularity used in [6], [7], [10], [12], fail to create separable classes and the samples will wrongly be classified as machine-printed. Table 1 shows the comparison of our approach with the results provided by Zagoris et al [3], [4], in which 15% of the samples in the PRImA-NHM dataset are utilised for train and the remaining 85% for the test phase.…”
Section: Gallery Creation and The Hmc Resultsmentioning
confidence: 99%
“…The features consist of geometrical, statistical moments, and contours histograms. In [6], eleven features mainly based on the ratio of the statistical and geometrical features of the segmented words to the geometrical size of their bounding boxes (width or height) are extracted from the regions within the bounding boxes.…”
Section: Literature Reviewmentioning
confidence: 99%
See 2 more Smart Citations
“…These methods primarily use the horizontal projection profile, which is obtained by summing pixel values along the horizontal axis. It is achieved by finding its maximum and minimum [5][6]. Each local maximum represents the text line, while local minimum means interline spacing.…”
Section: Introductionmentioning
confidence: 99%