Convolutional Character Networks

Xing, Liyuan; Tian, Zhi; Huang, Weilin; Scott, Matthew R.

doi:10.1109/iccv.2019.00922

Cited by 162 publications

(108 citation statements)

References 38 publications

(115 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, in order to sufficiently exploit the complementarity between detection and recognition, many methods [45], [4], [5], [6], [46], [7], [17], [47], [37], [48], [49], [50] are proposed to spot text in an end-to-end manner, which utilize the recognition information to optimize the localization task.…”

Section: A Text Reading In Single Imagesmentioning

confidence: 99%

FREE: A Fast and Robust End-to-End Video Text Spotter

Cheng

Lü

Zou

et al. 2021

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

Currently, video text spotting tasks usually fall into the four-staged pipeline: detecting text regions in individual images, recognizing localized text regions frame-wisely, tracking text streams and post-processing to generate final results. However, they may suffer from the huge computational cost as well as suboptimal results due to the interferences of low-quality text and the none-trainable pipeline strategy. In this paper, we propose a fast and robust end-to-end video text spotting framework named FREE by only recognizing the localized text stream onetime instead of frame-wise recognition. Specifically, FREE first employs a well-designed spatial-temporal detector that learns text locations among video frames. Then a novel text recommender is developed to select the highest-quality text from text streams for recognizing. Here, the recommender is implemented by assembling text tracking, quality scoring and recognition into a trainable module. It not only avoids the interferences from the low-quality text but also dramatically speeds up the video text spotting. FREE unites the detector and recommender into a whole framework, and helps achieve global optimization. Besides, we collect a large scale video text dataset for promoting the video text spotting community, containing 100 videos from 21 real-life scenarios. Extensive experiments on public benchmarks show our method greatly speeds up the text spotting process, and also achieves the remarkable state-of-the-art.

show abstract

Section: A Text Reading In Single Imagesmentioning

confidence: 99%

FREE: A Fast and Robust End-to-End Video Text Spotter

Cheng

Lü

Zou

et al. 2021

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

show abstract

“…He et al [31] proposed an end-to-end framework based on SSD by introducing an text attention module, which enables a direct text mask supervision and achieves strong performance improvements by training text detection and recognition jointly. Xing [32] proposed a one-stage model that processes text detection and recognition simultaneously.…”

Section: Scene Text Spotting With Deep Learningmentioning

confidence: 99%

A Single Neural Network for Mixed Style License Plate Detection and Recognition

2021

View full text Add to dashboard Cite

Most existing methods for automatic license plate recognition (ALPR) focus on a specific license plate (LP) type, but little work focuses on multiple or mixed LPs. This paper proposes a single neural network called ALPRNet for detection and recognition of mixed style LPs. In ALPRNet, two fully convolutional one stage object detectors are used to detect and classify LPs and characters simultaneously, which are followed by an assembly module to output the LP strings. ALPRNet treats LP and character equally, object detectors directly output bounding boxes of LPs and characters with corresponding labels, so they avoid the recurrent neural network (RNN) branches of optical character recognition (OCR) of the existing recognition approaches. We evaluate ALPRNet on a mixed LP style dataset and two datasets with single LP style, the experimental results show that the proposed network achieves state-of-the-art results with a simple one-stage network.INDEX TERMS ALPRNet, license plate recognition, object recognition, convolutional neural network.

show abstract

“…However, most existing scene text benchmarks do not include character-level annotations. To solve this problem, we use the iterative learning approach [9] to obtain character-level data. Instead of iterating from synthetic data in [9], we directly utilize their trained model to get character labels for further iterations.…”

Section: Implementation Detailsmentioning

confidence: 99%

“…To remove the background area and more precisely locate characters rather than text, Mask TextSpotter [7], inspired by Mask-RCNN [8], proposes to detect all characters for each bounding box proposal and then perform character recognition. Recently, based on the observation that the two-stage framework which involves ROI pooling degrades the text recognition performance, CharNet [9] proposes a one-stage architecture to achieve higher efficiency. CharNet follows the pipeline of Mask TextSpotter and groups characters to text by the guidance of the relative position between detection results of characters and texts.…”

Section: Introductionmentioning

confidence: 99%

Pointer Networks for Arbitrary-Shaped Text Spotting

Zhang

Yang

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Current text spotting methods perform text detection and text recognition separately. However, in complex scenes where bounding boxes of texts with various shapes are often overlapped, text detection becomes error-prone. By contrast, character detection is more non-ambiguous and easier to learn. In this paper, we present a highly efficient one-stage method named PointerNet for arbitrary-shaped text spotting. Unlike previous methods, PointerNet does not rely on text detection and opens a novel spotting-by-character-detection paradigm. In particular, to connect characters to texts, we propose a simple yet highly effective strategy named pointer that learns the 2D offset from the center of the current character to the center of the subsequent character. Evaluations demonstrate that our PointerNet achieves state-of-the-art performance and is more efficient than current methods (75ms vs. 133ms compared with FOTS). Our code will be publicly available.

show abstract

Convolutional Character Networks

Cited by 162 publications

References 38 publications

FREE: A Fast and Robust End-to-End Video Text Spotter

FREE: A Fast and Robust End-to-End Video Text Spotter

A Single Neural Network for Mixed Style License Plate Detection and Recognition

Pointer Networks for Arbitrary-Shaped Text Spotting

Contact Info

Product

Resources

About