Mast

Kasar, Thotreingam; Kumar, Deepak; Prasad, Mahesh; Girish, Deeptha; Ramakrishnan, A. G.

doi:10.1145/2034617.2034633

Cited by 9 publications

(1 citation statement)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We manually segmented word images [1] and recognized these images using OCR to benchmark maximum possible recognition rate for each database [6]. The recognition rates of the proposed methods and the benchmark results are reported on the seven publicly available word image data sets and compared with the results reported in the literature.…”

mentioning

confidence: 99%

Methods for text segmentation from scene images

Kumar

Ramakrishnan

2014

ELCVIA

View full text Add to dashboard Cite

Camera-captured scene/born-digital image analysis helps in the development of vision for robots to read text, transliterate or translate text, navigate and retrieve search results. However, text in such images does not follow any standard layout, and its location within the image is random in nature. In addition, motion blur, non-uniform illumination, skew, occlusion and scale-based degradations increase the complexity in locating and recognizing the text in a scene/born-digital image.OTCYMIST method [2] is proposed to segment text from the born-digital images. This method won the first place in ICDAR 2011 [9] and placed in the third position in ICDAR 2013 [11] for its performance on the text segmentation task in robust reading competitions for born-digital image data set. Here, Otsu binarization and Canny edge detection are separately carried out on the three colour planes of the image. Connected components (CCs) obtained from the segmented image are pruned based on thresholds applied on their area and aspect ratio. CCs with sufficient edge pixels are retained. The centroids of the individual CCs are used as nodes of a graph. A minimum spanning tree is built using these nodes of the graph. Long edges are broken from the minimum spanning tree of the graph. Pairwise height ratio is used to remove likely non-text components. CCs are grouped based on their proximity in the horizontal direction to generate bounding boxes (BBs) of text strings. Overlapping BBs are removed using an overlap area threshold. Non-overlapping and minimally overlapping BBs are retained for text segmentation. These BBs are split vertically to localize text at the word level.A word cropped from a document image can easily be recognized using a traditional optical character recognition (OCR) engine. However, recognizing a word, obtained by manually cropping a scene/born-digital image, is not trivial. Existing OCR engines do not handle these kinds of scene word images effectively. Our intention is to first segment the word image and then pass it to the existing OCR engines for recognition. It is advantageous in two aspects: it avoids building a character classifier from scratch and reduces the word recognition task to a word segmentation task. Here, we propose three bottom-up approaches to segment a cropped word image. These approaches choose different features at the initial stage of segmentation.Power-law transform (PLT) [3] was applied to the pixels of the gray scale born-digital images to non-linearly enhance the histogram. The recognition rate achieved on born-digital word images is 82.9%, which is 20% more than the top performing entry (61.5%) in ICDAR 2011 [9] robust reading competition. The recognition rate is 82.7% and 64.6% for born-digital and scene images of ICDAR 2013 robust reading competition [10], respectively, using PLT.

show abstract

mentioning

confidence: 99%

Methods for text segmentation from scene images

Kumar

Ramakrishnan

2014

ELCVIA

View full text Add to dashboard Cite

show abstract

Multi-script and Multi-oriented Text Localization from Scene Images

Kasar

Ramakrishnan

2012

Camera-Based Document Analysis and Recognition

View full text Add to dashboard Cite

This paper describes a new method of color text localization from generic scene images containing text of different scripts and with arbitrary orientations. A representative set of colors is first identified using the edge information to initiate an unsupervised clustering algorithm. Text components are identified from each color layer using a combination of a support vector machine and a neural network classifier trained on a set of low-level features derived from the geometric, boundary, stroke and gradient information. Experiments on camera-captured images that contain variable fonts, size, color, irregular layout, non-uniform illumination and multiple scripts illustrate the robustness of the method. The proposed method yields precision and recall of 0.8 and 0.86 respectively on a database of 100 images. The method is also compared with others in the literature using the ICDAR 2003 robust reading competition dataset.

show abstract