Improving patch-based scene text script identification with ensembles of conjoined networks

Gómez, Lluís; Nicolaou, Anguelos; Karatzas, Dìmosthenis

doi:10.1016/j.patcog.2017.01.032

Cited by 75 publications

(35 citation statements)

References 59 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…-We experimentally demonstrate that script identification is not required to recognize multi-language text. Unlike competing methods [12,33], E2E-MLT performs script identification from the OCR output using a simple majority voting mechanism over the predicted characters.…”

Section: Introductionmentioning

confidence: 99%

E2E-MLT - An Unconstrained End-to-End Method for Multi-language Scene Text

Bušta

Patel

Matas

2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

An end-to-end trainable (fully differentiable) method for multilanguage scene text localization and recognition is proposed. The approach is based on a single fully convolutional network (FCN) with shared layers for both tasks. E2E-MLT is the first published multi-language OCR for scene text. While trained in multi-language setup, E2E-MLT demonstrates competitive performance when compared to other methods trained for English scene text alone. The experiments show that obtaining accurate multi-language multi-script annotations is a challenging problem.

show abstract

Section: Introductionmentioning

confidence: 99%

E2E-MLT - An Unconstrained End-to-End Method for Multi-language Scene Text

Bušta

Patel

Matas

2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…The MLP has exhibited a maximum accuracy of 90% among them and time complexity is significantly high. Lluis Gomez et al [10] have presented the scene text script identification by the improved patch-based method.…”

Section: Literature Surveymentioning

confidence: 99%

“…[12], and MLe2e. [8]. The CVSI-2015 dataset contains 10 scripts namely; English, Hindi, Bengali, Oriya, Gujarathi, Punjabi, Kannada, Tamil, Telugu, Arab and the dataset size of 10,665; SIW-13 has the 13 scripts; Tibetan, Thai, Russian, Mongolian, Korean, Kannada, Japanese, Hebrew, Greek, English, Chinese, Cambodian, and Arabic.…”

Section: Data Collectionmentioning

confidence: 99%

Camera-Based Bi-lingual Script Identification at Word Level using SFTA Features

Dhandra¹,

Mallappa²,

Mukarambi³

2019

IJRTE

View full text Add to dashboard Cite

 Abstract: Most of the documents in various application areas like Government, Business and Research are available in the form of bi-lingual/multi-lingual text document. The multilingual documents are captured from video/camera for identification of script of the text document for automatic reading and editing. In this paper, an attempt is made to address the problem of script identification from camera captured document images using SFTA features. The input image is decomposed into a group of binary images by applying TTBD with fixing the number of the threshold as t n =3 empirically, on each decomposed binary image, Box Count, Mean Gray Level, and Pixel Count are extracted to form the feature vector. This feature vector is submitted to K-NN classifier to identify the scripts of the input document image. In all 10 scripts of the Indian languages are considered along with common English language as bi-lingual documents. The novelty of the paper is that 7 features are selected as potential features to obtain the highest accuracy. Features like Box Count (3), Mean Gray Level(2), and Pixel Count (2) have obtained the 87.02% recognition accuracy for English and Hindi Script combinations for the collected dataset and encouraging results for other combinations. These 7 potential features were selected using the technique named as feed-forward feature selection, from the set all 18 features.

show abstract

“…There were multistage training process and great computation due to clustering. Inspired by Siamese network [14], Gomez et al [15] proposed an improved patch-based method containing an ensemble of identical nets to learn discriminative strokepart representations. Mei et al [16] adopted Convolutional Recurrent Neural Networks [17] to extract the image representation and spatial dependency which is discriminative in spite of sharing characters.…”

Section: Introductionmentioning

confidence: 99%

“…As for the problem of arbitrary aspect ratios, recent methods with good performance take densely cropped image patches with fixed size as input [12], [13], [15], [20]. They also employ data augmentation somehow, but they suffered from the following three issues.…”

Section: Introductionmentioning

confidence: 99%

Patch Aggregator for Scene Text Script Identification

Cheng

Huang

Bai

et al. 2019

2019 International Conference on Document Analysis and Recognition (ICDAR)

View full text Add to dashboard Cite

Script identification in the wild is of great importance in a multi-lingual robust-reading system. The scripts deriving from the same language family share a large set of characters, which makes script identification a fine-grained classification problem. Most existing methods make efforts to learn a single representation that combines the local features by making a weighted average or other clustering methods, which may reduce the discriminatory power of some important parts in each script for the interference of redundant features. In this paper, we present a novel module named Patch Aggregator (PA), which learns a more discriminative representation for script identification by taking into account the prediction scores of local patches. Specifically, we design a CNN-based method consisting of a standard CNN classifier and a PA module. Experiments demonstrate that the proposed PA module brings significant performance improvements over the baseline CNN model, achieving the state-of-the-art results on three benchmark datasets for script identification: SIW-13, CVSI 2015 and RRC-MLT 2017.

show abstract

Improving patch-based scene text script identification with ensembles of conjoined networks

Cited by 75 publications

References 59 publications

E2E-MLT - An Unconstrained End-to-End Method for Multi-language Scene Text

E2E-MLT - An Unconstrained End-to-End Method for Multi-language Scene Text

Camera-Based Bi-lingual Script Identification at Word Level using SFTA Features

Patch Aggregator for Scene Text Script Identification

Contact Info

Product

Resources

About