2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.00452
|View full text |Cite
|
Sign up to set email alerts
|

A Multiplexed Network for End-to-End, Multilingual OCR

Abstract: This paper discusses the challenges of optical character recognition (OCR) on natural scenes, which is harder than OCR on documents due to the wild content and various image backgrounds. We propose to uniformly use word error rates (WER) as a new measurement for evaluating scene-text OCR, both end-to-end (e2e) performance and individual system component performances. Particularly for the e2e metric, we name it DISGO WER as it considers Deletion, Insertion, Substitution, and Grouping/Ordering errors. Finally we… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0
2

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 40 publications
(14 citation statements)
references
References 60 publications
0
10
0
2
Order By: Relevance
“…In this section, we experimentally validate our proposed TANGER by comparing the performance with the state-of-theart methods on several public datasets as well as one newly collected multilingual dataset TsiText. First, we examine the performance of TANGER for multilingual scene text recognition in comparison with two end-to-end methods [10,22] and one dictionary-guided method [34]. Then, we compare our model with the vision transformer ViTSTR [5] in three variants, i.e., tiny, small, and base versions for monolingual scene text recognition.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…In this section, we experimentally validate our proposed TANGER by comparing the performance with the state-of-theart methods on several public datasets as well as one newly collected multilingual dataset TsiText. First, we examine the performance of TANGER for multilingual scene text recognition in comparison with two end-to-end methods [10,22] and one dictionary-guided method [34]. Then, we compare our model with the vision transformer ViTSTR [5] in three variants, i.e., tiny, small, and base versions for monolingual scene text recognition.…”
Section: Resultsmentioning
confidence: 99%
“…Busta et al [10] design probably the first method, called E2E-MLT, for multilingual scene text recognition on the basis of a single fully convolutional network, which is demonstrated to be competitive even on rotated and vertical text instances. An E2E approach to script identification with different recognition heads, called Multiplexed Multilingual Mask TextSpotter, was proposed in [22], which can support the removal of existing or inclusion of new languages.…”
Section: A Scene Text Recognitionmentioning
confidence: 99%
“…Our method aims at use cases involving creative self expression and augmented reality (e.g., photo-realistic translation, leveraging multi-lingual OCR technologies [59]). Our method can be used for data generation and augmentation for training future OCR systems, as successfully done by others [49], [60] and in other domains [61], [62].…”
Section: Discussionmentioning
confidence: 99%
“…Also, seq2seq was used in aspect term extraction, where the source sequence and target sequence is composed of words and labels ,respectively Ma et al (2019). Huang et al Huang et al (2019) proposed an end-to-end approach which can recognize multiple languages in images considering data imbalance between languages. Lewis et al Lewis et al (2020) proposed a denoising autoencoder named BART, which combines Bidirectional and Auto-Regressive Transformers for pretraining sequence-to-sequence models, and BART is effective for text generation after fine-tuning.…”
Section: Sequence To Sequencementioning
confidence: 99%