Automatic handwritten script identification from document images facilitates many important applications such as sorting, transcription of multilingual documents and indexing of large collection of such images, or as a precursor to optical character recognition (OCR). In this paper, we investigate a texture as a tool for determining the script of handwritten document image, based on the observation that text has a distinct visual texture. Further, K nearest neighbour algorithm is used to classify 300 text blocks as well as 400 text lines into one of the three major Indian scripts: English, Devnagari and Urdu, based on 13 spatial spread features extracted using morphological filters. The proposed algorithm attains average classification accuracy as high as 99.2% for bi-script and 88.6% for tri-script separation at text line and text block level respectively with five fold cross validation test.
Abstract-This paper presents directional discrete cosine transforms (D-DCT) based word level handwritten script identification. The conventional discrete cosine transform (DCT) emphasizes vertical and horizontal energies of an image and de-emphasizes directional edge information, which of course plays a significant role in shape analysis problem, in particular. Conventional DCT however, is not efficient in characterizing the images where directional edges are dominant. In this paper, we investigate two different methods to capture directional edge information, one by performing 1D-DCT along left and right diagonals of an image, and another by decomposing 2D-DCT coefficients in left and right diagonals. The mean and standard deviations of left and right diagonals of DCT coefficients are computed and are used for the classification of words using linear discriminant analysis (LDA) and K-nearest neighbour (K-NN). We validate the method over 9000 words belonging to six different scripts. The classification of words is performed at biscripts, tri-scripts and multi-scripts scenarios and accomplished the identification accuracies respectively as 96.95%, 96.42% and 85.77% in average.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.