“…The problem of layout analysis on newspaper data has been addressed by few researchers [8,7,9,10,17,18,20,16]. Gatos et al [8] proposed a two stage technique for layout analysis of newspaper page.…”
Digitization of newspaper article is important for registering historical events. Layout analysis of Indian newspaper is a challenging task due to the presence of different font size, font styles and random placement of text and non-text regions. In this paper we propose a novel framework for learning optimal parameters for text graphic separation in the presence of complex layouts. The learning problem has been formulated as an optimization problem using EM algorithm to learn optimal parameters depending on the nature of the document content.
“…The problem of layout analysis on newspaper data has been addressed by few researchers [8,7,9,10,17,18,20,16]. Gatos et al [8] proposed a two stage technique for layout analysis of newspaper page.…”
Digitization of newspaper article is important for registering historical events. Layout analysis of Indian newspaper is a challenging task due to the presence of different font size, font styles and random placement of text and non-text regions. In this paper we propose a novel framework for learning optimal parameters for text graphic separation in the presence of complex layouts. The learning problem has been formulated as an optimization problem using EM algorithm to learn optimal parameters depending on the nature of the document content.
“…The first one treats a word as a collection of simpler subunits such as characters and proceeds by segmenting the word into these units, identifying the units and building a word-level interpretation using the lexicon. The second one treats the word as a single, indivisible entity and attempts to recognize it using features of the word as whole [1][2][3][4]. This approach is referred to as the word-based or holistic and is inspired in part by psychological studies of human reading, which indicate that humans use features of word shape such as length, ascenders, and descanters in reading (Fig.…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.