Block segmentation and text extraction in mixed text/image documents

Wahl, Friedrich M.; Wong, Kwan Y.; Casey, R. G.

doi:10.1016/0146-664x(82)90059-4

Cited by 341 publications

(44 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The use of a novel Adaptive Run Length Smoothing Algorithm (ARLSA) which is a modified version of the state-of-the-art RLSA [5] and efficiently groups homogeneous document regions. The definition of background obstacles that ALRSA is not allowed crossing in order to avoid merging neighboring text columns or text lines.…”

Section: Proposed Methodologymentioning

confidence: 99%

“…These techniques can be categorized based on the document image segmentation algorithm that they adopt. The most known of these segmentation algorithms are the following: X-Y cuts or projection profiles based [4], Run Length Smoothing Algorithm (RLSA) [5], component grouping [6], document spectrum [7], whitespace analysis [8], constrained text lines [9], Hough transform [10,11], Voronoi tessellation [12] and Scale space analysis [13]. All of the above segmentation algorithms are mainly designed for contemporary documents.…”

Section: Related Workmentioning

confidence: 99%

“…The RLSA [5] is one of the most common algorithms used in page layout analysis and segmentation techniques. It operates on binary images in a specified direction (usually horizontal or vertical) by replacing a sequence of background pixels with foreground pixels if the number of background pixels in the sequence is smaller or equal than a predefined threshold T max .…”

Section: Adaptive Run Length Smoothingmentioning

confidence: 99%

See 2 more Smart Citations

Segmentation of historical machine-printed documents using Adaptive Run Length Smoothing and skeleton segmentation paths

Nikolaou

Makridis

Gatos

et al. 2010

Image and Vision Computing

101

View full text Add to dashboard Cite

Section: Proposed Methodologymentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Adaptive Run Length Smoothingmentioning

confidence: 99%

See 1 more Smart Citation

Segmentation of historical machine-printed documents using Adaptive Run Length Smoothing and skeleton segmentation paths

Nikolaou

Makridis

Gatos

et al. 2010

Image and Vision Computing

101

View full text Add to dashboard Cite

“…It then segregates the image from the large white rectangles. The run-length smearing method [21], on the other hand, seeks to enter at the bottom of the hierarchy. It goes along each scan line and blackens the small spaces between black pixels.…”

Section: Document Layout Analysismentioning

confidence: 99%

Retrieving information from document images: problems and solutions

Chang

2001

IJDAR

View full text Add to dashboard Cite

An information retrieval system that captures both visual and textual contents from paper documents can derive maximal benefits from DAR techniques while demanding little human assistance to achieve its goals. This article discusses technical problems, along with solution methods, and their integration into a wellperforming system. The focus of the discussion is very difficult applications, for example, Chinese and Japanese documents. Solution methods are also highlighted, with the emphasis placed upon some new ideas, including window-based binarization using scale measures, document layout analysis for solving the multiple constraint problem, and full-text searching techniques capable of evading machine recognition errors.

show abstract

“…There are two classes of document segmentation methods. The first class uses bottom-up techniques, including O'Gorman's Docstrum algorithm [8], the Voronoi diagram based algorithm of Kise et al [9], the run-length smearing algorithm of Wahl et al [10], the segmentation algorithm of Jain and Yu [11], the text string separation algorithm of Fletcher and Kasturi [12], the 'white tiles' method of Antonacopoulos [13] and Mitchell and Yan's pattern spread and soft ordering methods [14,15]. The second class are top-down techniques, including the X -Y-cut-based algorithm of Nagy et al [16], and the shape-directed-covers based algorithm of Baird et al [17].…”

Section: Introductionmentioning

confidence: 99%