This paper presents a new method of page seyme.ntation based o n analysis of background (white u.~eas). The proposed method is capable of seymentiny puyes with non-rectangular layout as well as with varLo~u.s omgles of skew. The characteristics of the method is II.S follows: (1) Thinning of background enables u*s to represent white areas of any shape as connected thin lines or chains. The robustness for tilted page ima,ges is u.1~0 achieved by the representation. (2) Based on this representation, the task of page segmentation is dejhed as to find loops enclosing printed areas. The tu.sX: is achieved by eliminating unnecessary chains using ,not only a feature of white areas, but also a feature of black areas divided by a chain. Based on the expe.rimentu.1 results and the comparison with previous ,methods, 'we discuss the advantages and limitations of the proposed method.
Page segmentation is a process used to extract such components as columns, figures, tables, and photos from an image of a document. This article proposes a page segmentation technique that is stable, irrespective of component shape or tilted document image, based on analyzing the white region (background) of the document image. When we process a document that has non‐rectangular and tilted components, the boundary of the components, that is, the white region, takes any shape. Thus, important questions include how to express white regions and how to process them. The proposed method uses thin lines that are extracted by thinning as an expression of white regions. Based on this expression of white regions, page segmentation is defined as extracting loops that surround the components. The proposed method extracts loops by eliminating unnecessary thin lines, for example, those that represent line spacing and character spacing. We try to use not only the feature of white regions, but also those of black regions, and to process several kinds of document layout. This paper examines the effectiveness and limitations of the proposed method based on experimental results that are taken from 80 sample images that are tilted from 0 to 45 degrees. © 1998 Scripta Technica. Syst Comp Jpn, 29(3): 59–68, 1998
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.