TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting

Feng, Wei; He, Wenhao; Yin, Fei; Zhang, Xu-Yao; Liu, Cheng‐Lin

doi:10.1109/iccv.2019.00917

Cited by 176 publications

(79 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this section, we compare our network with the stateof-the-art approaches [1], [3], [10], [11], [16], [18], [20], [21], [23], [24], [27]- [29], [43], [45], [49], [65], [66], [69], [69], [71]- [73] on six different benchmark datasets. We consider recall, precision, and f-measure as the metrics for evaluation of accuracy of detection.…”

Section: Comparison With State-of-the-art Resultsmentioning

confidence: 99%

“…Mask TextSpotter [11] uses semantic segmentation to detect text of arbitrary shapes and spatial attention for handling text instances of irregular shapes by simultaneously considering local and global textual information. TextDragon [73] describes the shape of text with a sequence of quadrangles to handle the text of arbitrary shapes and RoISlide that connect a deep network and connectionist temporal classification based text recognizer. The labeling of locations of characters is not needed.…”

Section: Scene Text Spottingmentioning

confidence: 99%

See 1 more Smart Citation

Cluttered TextSpotter: An End-to-End Trainable Light-Weight Scene Text Spotter for Cluttered Environment

2020

View full text Add to dashboard Cite

Scene text spotting aims at simultaneously localizing and recognizing text instances, symbols, and logos in natural scene images. Scene text detection and recognition approaches have received immense attention in computer vision research community. The presence of partial occlusion or truncation artifact due to the cluttered background of scene images creates an obstacle in perceiving the text instances, which makes the process of spotting very complex. In this paper, we propose a lightweight scene text spotter that can address the issue of cluttered environment of scene images. It is an end-to-end trainable deep neural network that uses local part information, global structural features, and context cue information of oriented region proposals for spotting text instances. It helps to localize in scene images with background clutters, where partially occluded text parts, truncation artifacts, and perspective distortions are present. We mitigate the problem of misclassification caused by inter-class interference by exploring inter-class separability and intra-class compactness. We also incorporate multi-language character segmentation and word-level recognition in a lightweight recognition module. We have used six publicly available benchmark datasets in different smart devices to illustrate the efficacy of the network.

show abstract

Section: Comparison With State-of-the-art Resultsmentioning

confidence: 99%

Section: Scene Text Spottingmentioning

confidence: 99%

Cluttered TextSpotter: An End-to-End Trainable Light-Weight Scene Text Spotter for Cluttered Environment

2020

View full text Add to dashboard Cite

show abstract

“…It can be seen from Table II that we have also achieved very good performance on the CTW1500 data set. Especially in the text detection task, it is 1.1% higher than [30].…”

Section: ) Curved Textmentioning

confidence: 93%

Toward Arbitrary-Shaped Text Spotting Based on End-to-End

et al. 2020

View full text Add to dashboard Cite

At present, text spotting in natural scenes has become one of the research hotspots. Among them, curvilinear text and long text are the main difficulties of text spotting in natural scenes. To better solve these two types of problems, we propose a novel end-to-end text spotting model. The model includes three parts: shared convolution module, text detector module and text recognizer module. For the problem of long text, we adopt the corner attention mechanism to extract the features of long text more effectively. For the problem of curve text, we feed the rectification feature map into the SA-BiLSTM decoder to recognize the curve text more effectively. More importantly, the joint optimization strategy realizes the mutual promotion function of the text detection task and the text recognition task. Experimental results on TotalText, ICDAR2015, ICDAR2013, CTW1500, COCO-Text and MLT datasets prove that our method achieves excellent performance and robustness in text spotting tasks based on end-to-end natural scenes.

show abstract

“…Recently, in order to sufficiently exploit the complementarity between detection and recognition, many methods [45], [4], [5], [6], [46], [7], [17], [47], [37], [48], [49], [50] are proposed to spot text in an end-to-end manner, which utilize the recognition information to optimize the localization task.…”

Section: A Text Reading In Single Imagesmentioning

confidence: 99%

“…In fact, it is a very challenging task to optimize video text spotter end-to-end when taking multiple functional modules (text detection, text tracking and text recognition) into consideration, especially compared to the traditional four-staged pipeline strategy. Therefore, in this paper we develop an endto-end trainable video text spotter with only two trainable modules: the video text detector and the text recommender, similar to the end-to-end text spotting methods [6], [17], [45], [47], [48], [49] in single images.…”

Section: B Text Reading In Videosmentioning

confidence: 99%

FREE: A Fast and Robust End-to-End Video Text Spotter

Cheng

Lü

Zou

et al. 2021

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

Currently, video text spotting tasks usually fall into the four-staged pipeline: detecting text regions in individual images, recognizing localized text regions frame-wisely, tracking text streams and post-processing to generate final results. However, they may suffer from the huge computational cost as well as suboptimal results due to the interferences of low-quality text and the none-trainable pipeline strategy. In this paper, we propose a fast and robust end-to-end video text spotting framework named FREE by only recognizing the localized text stream onetime instead of frame-wise recognition. Specifically, FREE first employs a well-designed spatial-temporal detector that learns text locations among video frames. Then a novel text recommender is developed to select the highest-quality text from text streams for recognizing. Here, the recommender is implemented by assembling text tracking, quality scoring and recognition into a trainable module. It not only avoids the interferences from the low-quality text but also dramatically speeds up the video text spotting. FREE unites the detector and recommender into a whole framework, and helps achieve global optimization. Besides, we collect a large scale video text dataset for promoting the video text spotting community, containing 100 videos from 21 real-life scenarios. Extensive experiments on public benchmarks show our method greatly speeds up the text spotting process, and also achieves the remarkable state-of-the-art.

show abstract

TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting

Cited by 176 publications

References 29 publications

Cluttered TextSpotter: An End-to-End Trainable Light-Weight Scene Text Spotter for Cluttered Environment

Cluttered TextSpotter: An End-to-End Trainable Light-Weight Scene Text Spotter for Cluttered Environment

Toward Arbitrary-Shaped Text Spotting Based on End-to-End

FREE: A Fast and Robust End-to-End Video Text Spotter

Contact Info

Product

Resources

About