Neural Ctrl-F: Segmentation-Free Query-by-String Word Spotting in Handwritten Manuscript Collections

Wilkinson, Tomas; Lindström, Jonas; Brun, Anders

doi:10.1109/iccv.2017.475

Cited by 39 publications

(40 citation statements)

References 30 publications

(75 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are many successful applications of metric learning [11,3,27,33], such as ranking, image retrieval, face verification, speaker verification and so on. By far, applications of metric learning on document analysis or text reading were limited to the problem of word spotting and verification [1,26,34]. In this work, we verify the effectiveness of deep metric learning in text detection task.…”

Section: Related Workmentioning

confidence: 65%

Detecting Text in the Wild with Deep Character Embedding Network

Zhang

Sun

et al. 2019

Computer Vision – ACCV 2018

View full text Add to dashboard Cite

Most text detection methods hypothesize texts are horizontal or multi-oriented and thus define quadrangles as the basic detection unit. However, text in the wild is usually perspectively distorted or curved, which can not be easily tackled by existing approaches. In this paper, we propose a deep character embedding network (CENet) which simultaneously predicts the bounding boxes of characters and their embedding vectors, thus making text detection a simple clustering task in the character embedding space. The proposed method does not require strong assumptions of forming a straight line on general text detection, which provides flexibility on arbitrarily curved or perspectively distorted text. For character detection task, a dense prediction subnetwork is designed to obtain the confidence score and bounding boxes of characters. For character embedding task, a subnet is trained with contrastive loss to project detected characters into embedding space. The two tasks share a backbone CNN from which the multi-scale feature maps are extracted. The final text regions can be easily achieved by a thresholding process on character confidence and embedding distance of character pairs. We evaluated our method on ICDAR13, ICDAR15, MSRA-TD500, and Total Text. The proposed method achieves state-of-the-art or comparable performance on all of the datasets, and shows a substantial improvement in the irregular-text datasets, i.e. Total-Text.

show abstract

Section: Related Workmentioning

confidence: 65%

Detecting Text in the Wild with Deep Character Embedding Network

Zhang

Sun

et al. 2019

Computer Vision – ACCV 2018

View full text Add to dashboard Cite

show abstract

“…In [64], authors propose a hybrid approach where the document image is first subjected to dense text detection using sliding windows and later the word hypothesises are computed using the set of extremal regions. The third category of methods [6,84,85] in the segmentation-free setting is inspired by the recent success of region proposal based object detection techniques such as Faster R-CNN [57]. The Ctrl-F-Net [84] model proposes an end to end trainable detection and embedding network.…”

Section: Segmentation-free Approachesmentioning

confidence: 99%

HWNet v2: an efficient word image representation for handwritten documents

Krishnan

Jawahar

2019

IJDAR

View full text Add to dashboard Cite

We present a framework for learning an efficient holistic representation for handwritten word images. The proposed method uses a deep convolutional neural network with traditional classification loss. The major strengths of our work lie in: (i) the efficient usage of synthetic data to pre-train a deep network, (ii) an adapted version of the ResNet-34 architecture with the region of interest pooling (referred to as HWNet v2) which learns discriminative features for variable sized word images, and (iii) a realistic augmentation of training data with multiple scales and distortions which mimics the natural process of handwriting. We further investigate the process of transfer learning to reduce the domain gap between synthetic and real domain, and also analyze the invariances learned at different layers of the network using visualization techniques proposed in the literature.Our representation leads to a state-of-the-art word spotting performance on standard handwritten datasets and historical manuscripts in different languages with minimal representation size. On the challenging iam dataset, our method is first to report an mAP of around 0.90 for word spotting with a representation size of just 32 dimensions. Furthermore, we also present results on printed document datasets in English and Indic scripts which validates the generic nature of the proposed framework for learning word image representation.

show abstract

“…The EoIs assigned to the decoders are Passport Number, Name, (Gender, Birth Date), Birth Place, (Issue Place, Expiry Date). Nine decoders are set to cover ten EoIs for business card, and decoding steps of each decoder are 21,13,21,21,21,21,32,10,21. The EoIs of each decoder are Telephone, Postcode, Mobile, URL, Email, FAX, Address, (Name, Title), Company.…”

Section: B Experiments Settingmentioning

confidence: 99%

EATEN: Entity-Aware Attention for Single Shot Visual Text Extraction

Qin

Liu

et al. 2019

2019 International Conference on Document Analysis and Recognition (ICDAR)

View full text Add to dashboard Cite

Extracting entity from images is a crucial part of many OCR applications, such as entity recognition of cards, invoices, and receipts. Most of the existing works employ classical detection and recognition paradigm. This paper proposes an Entity-aware Attention Text Extraction Network called EATEN, which is an end-to-end trainable system to extract the entities without any post-processing. In the proposed framework, each entity is parsed by its corresponding entity-aware decoder, respectively. Moreover, we innovatively introduce a state transition mechanism which further improves the robustness of entity extraction. In consideration of the absence of public benchmarks, we construct a dataset of almost 0.6 million images in three realworld scenarios (train ticket, passport and business card), which is publicly available at https://github.com/beacandler/EATEN. To the best of our knowledge, EATEN is the first single shot method to extract entities from images. Extensive experiments on these benchmarks demonstrate the state-of-the-art performance of EATEN.

show abstract

Neural Ctrl-F: Segmentation-Free Query-by-String Word Spotting in Handwritten Manuscript Collections

Cited by 39 publications

References 30 publications

Detecting Text in the Wild with Deep Character Embedding Network

Detecting Text in the Wild with Deep Character Embedding Network

HWNet v2: an efficient word image representation for handwritten documents

EATEN: Entity-Aware Attention for Single Shot Visual Text Extraction

Contact Info

Product

Resources

About