2018 13th IAPR International Workshop on Document Analysis Systems (DAS) 2018
DOI: 10.1109/das.2018.35
|View full text |Cite
|
Sign up to set email alerts
|

Learning Deep Representations for Word Spotting under Weak Supervision

Abstract: Convolutional Neural Networks have made their mark in various fields of computer vision in recent years. They have achieved state-of-the-art performance in the field of document analysis as well. However, CNNs require a large amount of annotated training data and, hence, great manual effort. In our approach, we introduce a method to drastically reduce the manual annotation effort while retaining the high performance of a CNN for word spotting in handwritten documents. The model is learned with weak supervision… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 37 publications
(28 citation statements)
references
References 24 publications
0
28
0
Order By: Relevance
“…One possible augmentation strategy for word spotting is to apply different image transformations, such as shear, rotation and translation to the image, as has been proposed by Sudholt and Fink [25]. Gurjar et al [11] have shown that pre-training a CNN based word spotting approach with the synthetic dataset by Krishnan and Jawahar [15] can achieve a reasonable word spotting performance, even with only few training samples. Since the achieved improvements for both of these methods are independent from the particular training set, it is likely that improvements achieved through the application of augmentation, pre-training and sample selection will add up.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…One possible augmentation strategy for word spotting is to apply different image transformations, such as shear, rotation and translation to the image, as has been proposed by Sudholt and Fink [25]. Gurjar et al [11] have shown that pre-training a CNN based word spotting approach with the synthetic dataset by Krishnan and Jawahar [15] can achieve a reasonable word spotting performance, even with only few training samples. Since the achieved improvements for both of these methods are independent from the particular training set, it is likely that improvements achieved through the application of augmentation, pre-training and sample selection will add up.…”
Section: Related Workmentioning
confidence: 99%
“…In this paper, we use a PHOCNet with temporal pyramid pooling (TPP) layer, as described by Sudholt and Fink [25]. As described by Gurjar et al [11], we train the PHOCNet using stochastic gradient descend and use binary cross entropy as loss function when training to predict PHOCs.…”
Section: Training Setupmentioning
confidence: 99%
See 1 more Smart Citation
“…Even though the attribute CNN approach has shown excellent performance on numerous commonly used academic benchmarks, this comes at the cost of requiring training material. Works such as [5] and [6] try to alleviate the data problem by transfer learning and incorporating synthetic data, but still the necessity of representative training data is inherent to any machine learning based approach.…”
Section: A Word Spottingmentioning
confidence: 99%
“…The use of convolutional neural networks [ 23 , 24 ] increased the performance of word spotting systems but these networks need a training set with a large amount of annotated data for being trained. Many solutions have been proposed for improving the word spotting performance without increasing the size of the training set: sample selection [ 25 ], data augmentation [ 23 ], transfer learning [ 26 , 27 ], training on synthetic data [ 22 , 28 ] and relaxed feature matching [ 29 ].…”
Section: Introductionmentioning
confidence: 99%