Edit Probability for Scene Text Recognition

Bai, Fan; Cheng, Zhanzhan; Niu, Yi; Pu, Shiliang; Zhou, Shuigeng

doi:10.1109/cvpr.2018.00163

Cited by 171 publications

(115 citation statements)

References 28 publications

Supporting

Mentioning

115

Contrasting

Order By: Relevance

“…Shi et al [31] improved such CNN+RNN+CTC framework by making it end-toend trainable, with significant performance gain obtained. Recently, the framework was further improved by introducing various attention mechanisms, which are able to encode more character information explicitly or implicitly [33,4,1,12].…”

Section: Related Workmentioning

confidence: 99%

Convolutional Character Networks

Xing

Tian

Huang

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

145

View full text Add to dashboard Cite

Recent progress has been made on developing a unified framework for joint text detection and recognition in natural images, but existing joint models were mostly built on two-stage framework by involving ROI pooling, which can degrade the performance on recognition task. In this work, we propose convolutional character networks, referred as CharNet, which is an one-stage model that can process two tasks simultaneously in one pass. CharNet directly outputs bounding boxes of words and characters, with corresponding character labels. We utilize character as basic element, allowing us to overcome the main difficulty of existing approaches that attempted to optimize text detection jointly with a RNN-based recognition branch. In addition, we develop an iterative character detection approach able to transform the ability of character detection learned from synthetic data to real-world images. These technical improvements result in a simple, compact, yet powerful onestage model that works reliably on multi-orientation and curved text. We evaluate CharNet on three standard benchmarks, where it consistently outperforms the state-of-theart approaches [25,24] by a large margin, e.g., with improvements of 65.33%→71.08% (with generic lexicon) on ICDAR 2015, and 54.0%→69.23% on Total-Text, on endto-end text recognition. Code is available at: https:// github.com/MalongTech/research-charnet.

show abstract

Section: Related Workmentioning

confidence: 99%

Convolutional Character Networks

Xing

Tian

Huang

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

145

View full text Add to dashboard Cite

show abstract

“…It is observed that the performance of attention and CTC on all the datasets degrades as the shuffle ratio increases. Specifically, attention is more sensitive than CTC because misalignment problem can easily misleads the training process of attention [3]. In contrast, the proposed ACE loss function exhibits similar recognition results for all the settings of the shuffle ratio, this is because it only requires classes and their number for supervision, completely omitting character order information.…”

Section: Resultsmentioning

confidence: 96%

“…The attention mechanism was first proposed in machine translation [1,42] to enable a model to automatically search for parts of a source sentence for prediction. Then, the method rapidly became popular in applications such as (visual) question answering [32,52], image caption generation [50,52,31], speech recognition [2,25,32] and scene text recognition [39,3,19]. Most importantly, the attention mechanism can also be applied to 2D predictions, such as mathematical expression recognition [56,57] and paragraph recognition [4,5,46].…”

Section: Attention Mechanismmentioning

confidence: 99%

“…However, the attention mechanism relies on a complex attention module to fulfill its functionality, resulting in additional network parameters and runtime. Besides, missing or superfluous characters can easily cause misalignment problem, confusing and misleading the training process, and consequently degrading the recognition accuracy [3,2,9].…”

Section: Attention Mechanismmentioning

confidence: 99%

See 1 more Smart Citation

Aggregation Cross-Entropy for Sequence Recognition

Xie

Huang

Zhu

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

103

View full text Add to dashboard Cite

In this paper, we propose a novel method, aggregation cross-entropy (ACE), for sequence recognition from a brand new perspective. The ACE loss function exhibits competitive performance to CTC and the attention mechanism, with much quicker implementation (as it involves only four fundamental formulas), faster inference\back-propagation (approximately O(1) in parallel), less storage requirement (no parameter and negligible runtime memory), and convenient employment (by replacing CTC with ACE). Furthermore, the proposed ACE loss function exhibits two noteworthy properties: (1) it can be directly applied for 2D prediction by flattening the 2D prediction into 1D prediction as the input and (2) it requires only characters and their numbers in the sequence annotation for supervision, which allows it to advance beyond sequence recognition, e.g., counting problem. The code is publicly available at https://github.

show abstract

“…Generally, in the encoding stage, the convolutional neural networks (CNN) are used to extract features from the input image, whereas in the decoding stage, the encoded feature vectors are transcribed into target strings by exploiting the recurrent neural network (RNN) [12], [13], (c) The recurrent decoding process of attention-based decoder with AEG i a n a connectionist temporal classification (CTC) [14] or attention mechanism [15]. In particular, the attention-based approaches [4], [11], [16], [17], [18] often achieve better performance owing to the focus on informative areas.…”

Section: Introductionmentioning

confidence: 99%

Adaptive embedding gate for attention-based scene text recognition

et al. 2020

View full text Add to dashboard Cite

Scene text recognition has attracted particular research interest because it is a very challenging problem and has various applications. The most cutting-edge methods are attentional encoder-decoder frameworks that learn the alignment between the input image and output sequences. In particular, the decoder recurrently outputs predictions, using the prediction of the previous step as a guidance for every time step. In this study, we point out that the inappropriate use of previous predictions in existing attention mechanisms restricts the recognition performance and brings instability. To handle this problem, we propose a novel module, namely adaptive embedding gate (AEG). The proposed AEG focuses on introducing high-order character language models to attention mechanism by controlling the information transmission between adjacent characters. AEG is a flexible module and can be easily integrated into the state-ofthe-art attentional methods. We evaluate its effectiveness as well as robustness on a number of standard benchmarks, including the IIIT5K, SVT, SVT-P, CUTE80, and ICDAR datasets. Experimental results demonstrate that AEG can significantly boost recognition performance and bring better robustness. h t Recurrent Attention Decoder

show abstract

Edit Probability for Scene Text Recognition

Cited by 171 publications

References 28 publications

Convolutional Character Networks

Convolutional Character Networks

Aggregation Cross-Entropy for Sequence Recognition

Adaptive embedding gate for attention-based scene text recognition

Contact Info

Product

Resources

About