Text Lines and Snippets Extraction for 19th Century Handwriting Documents Layout Analysis

This paper presents an approach to extract curved text lines from Arabic handwritten documents, based on the perception mechanisms involved in the human reading process. Our approach is based on multi-agent systems to detect and group connected components that belong to the same line. This proposed system makes use of information about features and arrangement of those components. Experimental results on a data-set of Arabic handwritten documents show that this approach is a promising solution for extracting handwritten curved text lines.

show abstract

“…Other methods exist, as the k-means algorithm [11], the Hough transform [12,13] and active contour technique [14].…”

Section: Related Workmentioning

confidence: 99%

Arabic handwritten text line extraction using connected component analysis from a multi agent perspective

Boulid

Souhar

Elkettani

2015

2015 15th International Conference on Intelligent Systems Design and Applications (ISDA)

View full text Add to dashboard Cite

show abstract

“…11 (c), the proposed method has correctly segmented the text lines. Figure 12 (a) [4] shows the result of the piece-wise projection method [4] 13 Segmentation results on a difficult Korean document image of the UMD dataset produced by (a) Mumford-Shah model [18], (b) PDF-based method [16], and (c) the proposed method. Shah model [18] and PDF-based [16] method use horizontal blurring filters.…”

Section: Evaluation Methodologymentioning

confidence: 99%

Adaptive Script-Independent Text Line Extraction

Ziaratban

Faez

2011

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYIn this paper, an adaptive block-based text line extraction algorithm is proposed. Three global and two local parameters are defined to adapt the method to various handwritings in different languages. A document image is segmented into several overlapping blocks. The skew of each block is estimated. Text block is de-skewed by using the estimated skew angle. Text regions are detected in the de-skewed text block. A number of data points are extracted from the detected text regions in each block. These data points are used to estimate the paths of text lines. By thinning the background of the image including text line paths, text line boundaries or separators are estimated. Furthermore, an algorithm is proposed to assign to the extracted text lines the connected components which have intersections with the estimated separators. Extensive experiments on different standard datasets in various languages demonstrate that the proposed algorithm outperforms previous methods.

show abstract

“…The denoting of the text space is performed by a tracking script to create a curvilinear separation path between each pair of subsequent text lines, which in result leads to finding the separate text fragments. In [8] the text area is identified by first using image binarization and later separating the graph of connected components (CC) with segmentation methods. In a next step, Hough transform is applied to define each connected component and to calculate the distance vector for each graph component, resulting in designating the external edge blocks.…”

Section: Introductionmentioning

confidence: 99%

Text area detection in handwritten documents scanned for further processing

Pach

Antoniuk

Krupa

2020

MG&V

View full text Add to dashboard Cite

In this paper we present an approach to text area detection using binary images, Constrained Run Length Algorithm and other noise reduction methods of removing the artefacts. Text processing includes various activities, most of which are related to preparing input data for further operations in the best possible way, that will not hinder the OCR algorithms. This is especially the case when handwritten manuscripts are considered, and even more so with very old documents. We present our methodology for text area detection problem, which is capable of removing most of irrelevant objects, including elements such as page edges, stains, folds etc. At the same time the presented method can handle multi-column texts or varying line thickness. The generated mask can accurately mark the actual text area, so that the output image can be easily used in further text processing steps.

show abstract

Text Lines and Snippets Extraction for 19th Century Handwriting Documents Layout Analysis

Cited by 12 publications

References 9 publications

Arabic handwritten text line extraction using connected component analysis from a multi agent perspective

Arabic handwritten text line extraction using connected component analysis from a multi agent perspective

Adaptive Script-Independent Text Line Extraction

Text area detection in handwritten documents scanned for further processing

Contact Info

Product

Resources

About