PHOCNet: A Deep Convolutional Neural Network for Word Spotting in Handwritten Documents

Sudholt, Sebastian; Fink, Gernot A.

doi:10.1109/icfhr.2016.0060

Cited by 197 publications

(206 citation statements)

References 21 publications

Supporting

Mentioning

206

Contrasting

Order By: Relevance

“…The core of the proposed method consists of using a CNN as a feature extractor. We have used PHOCnet [2], a CNN architecture recently proposed for segmentation-based KWS. PHOCnet was the best performing model on the recent ICFHR 2016 KWS competition (unpenalized MAP scenario) [1].…”

Section: Methods and Model Parametersmentioning

confidence: 99%

“…Regarding further details on the network architecture, as well as details on how training is performed (parameters, number of iterations, use of dropout, etc. ), the reader is referred to the original publication [2]. All layers between the input layer and the SPP layer are of variable size, as they depend on the input word image size.…”

Section: Neural Network Architecture and Deep Featuresmentioning

confidence: 99%

“…Not surprisingly, learning-based methods are in general better performing than learning-free KWS methods [1]. The vast majority of recently proposed learning-based methods includes deep learning-based methods, which seem to have dominated this field as well [1][2][3].…”

Section: Introduction and Related Workmentioning

confidence: 99%

“…As an alternative to the Euclidean distance, the Bray-Curtis dissimilarity (BC) has recently been employed in the context of keyword spotting [2,9]. We compare BC with the Euclidean distance, with and without applying L2-normalization before evaluation.…”

Section: Introduction and Related Workmentioning

confidence: 99%

“…In this work, we use PHOCnet [2] as our model of reference. PHOCnet is a Deep Convolutional Network that has recently been proposed for KWS.…”

Section: Introduction and Related Workmentioning

confidence: 99%

See 4 more Smart Citations

Transferable Deep Features for Keyword Spotting

Retsinas

Sfikas

Gatos

2018

International Workshop on Computational Intelligence for Multimedia Understanding (IWCIM)

View full text Add to dashboard Cite

Deep features, defined as the activations of hidden layers of a neural network, have given promising results applied to various vision tasks. In this paper, we explore the usefulness and transferability of deep features, applied in the context of the problem of keyword spotting (KWS). We use a state-of-the-art deep convolutional network to extract deep features. The optimal parameters concerning their application are subsequently studied: the impact of the choice of hidden layer, the impact of applying dimensionality reduction with a manifold learning technique, as well as the choice of dissimilarity measure used to retrieve relevant word images. Extensive numerical results show that deep features lead to state-of-the-art KWS performance, even when the test and training set come from different document collections.

show abstract

Section: Methods and Model Parametersmentioning

confidence: 99%

Section: Neural Network Architecture and Deep Featuresmentioning

confidence: 99%

Section: Introduction and Related Workmentioning

confidence: 99%

Section: Introduction and Related Workmentioning

confidence: 99%

“…In this work, we use PHOCnet [2] as our model of reference. PHOCnet is a Deep Convolutional Network that has recently been proposed for KWS.…”

Section: Introduction and Related Workmentioning

confidence: 99%

See 3 more Smart Citations

Transferable Deep Features for Keyword Spotting

Retsinas

Sfikas

Gatos

2018

International Workshop on Computational Intelligence for Multimedia Understanding (IWCIM)

View full text Add to dashboard Cite

show abstract

Single Shot Scene Text Retrieval

Gómez

Mafla

Rusiñol

et al. 2018

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Textual information found in scene images provides high level semantic information about the image and its context and it can be leveraged for better scene understanding. In this paper we address the problem of scene text retrieval: given a text query, the system must return all images containing the queried text. The novelty of the proposed model consists in the usage of a single shot CNN architecture that predicts at the same time bounding boxes and a compact text representation of the words in them. In this way, the text based image retrieval task can be casted as a simple nearest neighbor search of the query text representation over the outputs of the CNN over the entire image database. Our experiments demonstrate that the proposed architecture outperforms previous state-of-the-art while it offers a significant increase in processing speed.

show abstract

A Recognition Method of the Similarity Character for Uchen Script Tibetan Historical Document Based on DNN

Wang

et al. 2018

Pattern Recognition and Computer Vision

View full text Add to dashboard Cite

PHOCNet: A Deep Convolutional Neural Network for Word Spotting in Handwritten Documents

Cited by 197 publications

References 21 publications

Transferable Deep Features for Keyword Spotting

Transferable Deep Features for Keyword Spotting

Single Shot Scene Text Retrieval

A Recognition Method of the Similarity Character for Uchen Script Tibetan Historical Document Based on DNN

Contact Info

Product

Resources

About