K. Pramod Sankar scite author profile

This paper describes the challenges for document image analysis community for building large digital libraries with diverse document categories. The challenges are identified from the experience of the ongoing activities toward digitizing and archiving one million books. Smooth workflow has been established for archiving large quantity of books, with the help of efficient image processing algorithms. However, much more research is needed to address the challenges arising out of the diversity of the content in digital libraries.

show abstract

Nearest neighbor based collection OCR

Sankar

Jawahar

Manmatha

2010

View full text Add to dashboard Cite

Conventional optical character recognition (OCR) systems operate on individual characters and words, and do not normally exploit document or collection context. We describe a Collection OCR which takes advantage of the fact that multiple examples of the same word (often in the same font) may occur in a document or collection. The idea here is that an OCR or a reCAPTCHA like process generates a partial set of recognized words. In the second stage, a nearest neighbor algorithm compares the remaining word-images to those already recognized and propagates labels from the nearest neighbors. It is shown that by using an approximate fast nearest neighbor algorithm based on Hierarchical K-Means (HKM), we can do this accurately and efficiently. It is also shown that profile based features perform much better than SIFT and Pyramid Histogram of Gradient (PHOG) features. We believe that this is because profile features are more robust to word degradations (common in our documents). This approach is applied to a collection of Telugu booksa language for which no commercial OCR exists. We show from a selection of 33 Telugu books that starting with OCR labels for only 30% of the collection we can recognize the remaining 70% of the words in the collection with 70% accuracy using this approach. Since the approach makes no language specific assumptions, it should be applicable to a large number of languages. In particular we are interested in its applicability to Indic languages and scripts.

show abstract

Subtitle-free Movie to Script Alignment

Sankar¹,

Jawahar²,

Zisserman

2009

View full text Add to dashboard Cite

A standard solution for aligning scripts to movies is to use dynamic time warping with the subtitles (Everingham et al., BMVC 2006). We investigate the problem of aligning scripts to TV video/movies in cases where subtitles are not available, e.g. in the case of silent films or for film passages which are non-verbal. To this end we identify a number of "modes of alignment" and train classifiers for each of these. The modes include visual features, such as locations and face recognition, and audio features such as speech. In each case the feature gives some alignment information, but is too noisy when used independently. We show that combining the different features into a single cost function and optimizing this using dynamic programming, leads to a performance superior to each of the individual features. The method is assessed on episodes from the situation comedy Seinfeld, and on Charlie Chaplin and Indian movies.

show abstract

Probabilistic Reverse Annotation for Large Scale Image Retrieval

Sankar

Jawahar

2007

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

K. Pramod Sankar

Adapting off-the-shelf CNNs for word spotting & recognition

Digitizing a Million Books: Challenges for Document Analysis

Nearest neighbor based collection OCR

Subtitle-free Movie to Script Alignment

Probabilistic Reverse Annotation for Large Scale Image Retrieval

Contact Info

Product

Resources

About

K. Pramod Sankar

Adapting off-the-shelf CNNs for word spotting &amp; recognition

Digitizing a Million Books: Challenges for Document Analysis

Nearest neighbor based collection OCR

Subtitle-free Movie to Script Alignment

Probabilistic Reverse Annotation for Large Scale Image Retrieval

Contact Info

Product

Resources

About

Adapting off-the-shelf CNNs for word spotting & recognition