Text, graphics and half-tones are the major constituents of any document page. While half-tone can be characterised by its inherent intensity variation, text and graphics share common characteristics except difference in spatial distri-
bution. The success of document image analysis systems depends on the proper segmentation of text and graphics as text is further subdivided into other classes such as heading, table and math-zones. Segmentation of graphics is essential for better OCR performance and vectorization in computer vision applications. Graphics segmentation from text is particularly difficult in the context of graphics made of small components (dashed or dotted lines etc.) which have many features similar to texts.Here we propose a robust technique for segmenting all sorts of graphics and texts in any orientation from document pages.
In this paper we propose a fully automatic hierarchical method for identification of forms using global as well as local features. Moments of certain orders are considered as global shape features and are utilised to reduce the search space by selecting a subset of forms present in the database. The type of the candidate form is then identified within this subset through detail analysis using local geometrical and topological features. The candidate form is then segmented to extract the user-filled information.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.