PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification

Obaidullah, Sk Md; Halder, Chayan; Santosh, K. C.; Das, Nibaran; Roy, Kaushik

doi:10.1007/s11042-017-4373-y

Cited by 72 publications

(16 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We have used the Cmaterdb dataset version 1.1.1 [37] and 1.5.1 [38], ICDAR 2013 Segmentation dataset [39], and PHDIndic_11 dataset [40] for performing skew correction and segmentation of the word images. Cmaterdb dataset version 1.1.1 and 1.5.1 are two benchmark datasets comprising of Bangla and Devanagari documents, respectively.…”

Section: Experimental Results and Analysismentioning

confidence: 99%

Segmentation‐based recognition system for handwritten Bangla and Devanagari words using conventional classification and transfer learning

Pramanik

Bag

2020

IET Image Processing

View full text Add to dashboard Cite

Section: Experimental Results and Analysismentioning

confidence: 99%

Segmentation‐based recognition system for handwritten Bangla and Devanagari words using conventional classification and transfer learning

Pramanik

Bag

2020

IET Image Processing

View full text Add to dashboard Cite

“…Modified log-Gabor filter (MLG) was used for feature extraction to develop bi-script (Devanagari-Roman and Bangla-Roman) and tri-script (Bangla-Devanagari-Roman) word-level script identification modules. In 2017, Obaidullah et al [12] presented a handwritten document image dataset at page-level named PHDIndic_11 having 11 officially recognized Indic scripts: Devanagari, Bangla, Urdu, Roman, Oriya, Gujarati, Gurumukhi, Tamil, Malayalam, Telugu, and Kannada. The paper also contained the results for handwritten script identification (HSI).…”

Section: Related Studymentioning

confidence: 99%

“…The extracted text blocks also have a chance of containing lines of varying size, thickness, and white spaces between characters, lines, and words. Instead of performing any homogenizing (12) ClassificationAccuracy (%) = #successfullyclassifiedcomponents #totalcomponents present × 100.…”

Section: Preparation Of Handwritten Indic Script Databasementioning

confidence: 99%

A Hybrid Swarm and Gravitation-based feature selection algorithm for handwritten Indic script classification problem

Guha

Ghosh

Singh

et al. 2021

Complex Intell. Syst.

View full text Add to dashboard Cite

In any multi-script environment, handwritten script classification is an unavoidable pre-requisite before the document images are fed to their respective Optical Character Recognition (OCR) engines. Over the years, this complex pattern classification problem has been solved by researchers proposing various feature vectors mostly having large dimensions, thereby increasing the computation complexity of the whole classification model. Feature Selection (FS) can serve as an intermediate step to reduce the size of the feature vectors by restricting them only to the essential and relevant features. In the present work, we have addressed this issue by introducing a new FS algorithm, called Hybrid Swarm and Gravitation-based FS (HSGFS). This algorithm has been applied over three feature vectors introduced in the literature recently—Distance-Hough Transform (DHT), Histogram of Oriented Gradients (HOG), and Modified log-Gabor (MLG) filter Transform. Three state-of-the-art classifiers, namely, Multi-Layer Perceptron (MLP), K-Nearest Neighbour (KNN), and Support Vector Machine (SVM), are used to evaluate the optimal subset of features generated by the proposed FS model. Handwritten datasets at block, text line, and word level, consisting of officially recognized 12 Indic scripts, are prepared for experimentation. An average improvement in the range of 2–5% is achieved in the classification accuracy by utilizing only about 75–80% of the original feature vectors on all three datasets. The proposed method also shows better performance when compared to some popularly used FS models. The codes used for implementing HSGFS can be found in the following Github link: https://github.com/Ritam-Guha/HSGFS.

show abstract

“…Features are extracted from sample images of handwritten text in 11 scripts at block, text‐line, and word levels. In Obaidullah, Halder, Santosh, Das, and Roy (), the authors have self‐prepared the PHDIndic_11 page‐level dataset (containing a total of 1,458 pages of handwritten samples in 11 official scripts of India) and compared its classification accuracy using script‐dependent and script‐independent feature sets at page‐level. MLP and simple logistic (SL) classifiers as well as a metaclassifier that combines both MLP and SL are used, and a 116‐element feature vector is extracted from each of the text images.…”

Section: Related Workmentioning

confidence: 99%

A clustering‐based feature selection framework for handwritten Indic script classification

et al. 2019

View full text Add to dashboard Cite

In India, which has numerous officially recognized scripts, there is a primary need for categorizing the documents on the basis of the scripts used therein. Identification of script used in a document is essential for its effective handling both manually and digitally. Identification of script in a document image is an important research problem in the pattern recognition field, which, at times, suffers from the issue of growing dimensionality of the feature vector and requires an efficient feature selection technique. Keeping this fact in mind, in this paper, we propose a clustering‐based filter feature selection framework in order to extract an optimal and effective feature subset from the original feature vector. The present feature selection methodology is evaluated on a script classification problem involving handwritten documents in 12 major Indic scripts. Experiments are done at word‐level, text‐line‐level, and block‐level. Experiments demonstrate that a reasonable increment in classification accuracy has been realized using comparatively lesser number of features. The proposed framework for feature selection is computationally inexpensive and can be applied to other pattern recognition problems as well.

show abstract

PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification

Cited by 72 publications

References 29 publications

Segmentation‐based recognition system for handwritten Bangla and Devanagari words using conventional classification and transfer learning

Segmentation‐based recognition system for handwritten Bangla and Devanagari words using conventional classification and transfer learning

A Hybrid Swarm and Gravitation-based feature selection algorithm for handwritten Indic script classification problem

A clustering‐based feature selection framework for handwritten Indic script classification

Contact Info

Product

Resources

About