2018
DOI: 10.1007/s10032-018-0295-0
|View full text |Cite
|
Sign up to set email alerts
|

Attribute CNNs for word spotting in handwritten documents

Abstract: Word spotting has become a field of strong research interest in document image analysis over the last years. Recently, AttributeSVMs were proposed which predict a binary attribute representation [3]. At their time, this influential method defined the state-of-theart in segmentation-based word spotting. In this work, we present an approach for learning attribute representations with Convolutional Neural Networks (CNNs). By taking a probabilistic perspective on training CNNs, we derive two different loss functio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
45
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 55 publications
(45 citation statements)
references
References 54 publications
0
45
0
Order By: Relevance
“…Taking inspirations for word attributes, for handwritten images, Poznanski et al [52] adapted VGGNet [72] for recognizing phoc attributes by having multiple parallel fully connected layers, each one predicting phoc attributes at a particular level. In similar spirits, different architectures [35,74,76,83] were proposed using cnn networks which embed features into different textual embedding spaces defined by phoc. In [74], Sudholt et al proposes an architecture to directly embed image features to phoc attributes by having sigmoid activation in the final layer and thereby avoiding multiple fully connected layers as presented in [52].…”
Section: Deep Learningmentioning
confidence: 99%
See 2 more Smart Citations
“…Taking inspirations for word attributes, for handwritten images, Poznanski et al [52] adapted VGGNet [72] for recognizing phoc attributes by having multiple parallel fully connected layers, each one predicting phoc attributes at a particular level. In similar spirits, different architectures [35,74,76,83] were proposed using cnn networks which embed features into different textual embedding spaces defined by phoc. In [74], Sudholt et al proposes an architecture to directly embed image features to phoc attributes by having sigmoid activation in the final layer and thereby avoiding multiple fully connected layers as presented in [52].…”
Section: Deep Learningmentioning
confidence: 99%
“…The next set of methods in this space use the principle of attribute embedding framework using deep cnn networks. Here, PHOCNet [74] and TPP-PHOCNet [75,76] uses the output space of cnn as phoc embedding while Triplet-CNN [83] explores with different embeddings such as phoc, dctow and few semantic embeddings. In the table, we report the best performance of Triplet-CNN across different proposed embeddings.…”
Section: Architecture Evaluationmentioning
confidence: 99%
See 1 more Smart Citation
“…e.g. [2], [3]) require a previous segmentation of document pages into individual word images, which is in general not an easy to solve problem. The segmentation-free approach does not pose this requirement, but aims at solving the retrieval and segmentation problem jointly.…”
Section: A Word Spottingmentioning
confidence: 99%
“…Our baseline word spotting system is based on the design of [2]. The attribute embedding is a 4-level PHOC representation of partitions 1, 2, 4 and 8 based on the lower case Latin alphabet plus digits.…”
Section: A Word Spottingmentioning
confidence: 99%