2021
DOI: 10.1109/tip.2020.3038520
|View full text |Cite
|
Sign up to set email alerts
|

FREE: A Fast and Robust End-to-End Video Text Spotter

Abstract: Currently, video text spotting tasks usually fall into the four-staged pipeline: detecting text regions in individual images, recognizing localized text regions frame-wisely, tracking text streams and post-processing to generate final results. However, they may suffer from the huge computational cost as well as suboptimal results due to the interferences of low-quality text and the none-trainable pipeline strategy. In this paper, we propose a fast and robust end-to-end video text spotting framework named FREE … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
29
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 21 publications
(29 citation statements)
references
References 91 publications
(147 reference statements)
0
29
0
Order By: Relevance
“…- Annotations. The annotation strategy is same to LSVTD [4]. For each text region, the annotation items is as follows: (1) Polygon coordinate represents text location.…”
Section: Datasetmentioning
confidence: 99%
See 3 more Smart Citations
“…- Annotations. The annotation strategy is same to LSVTD [4]. For each text region, the annotation items is as follows: (1) Polygon coordinate represents text location.…”
Section: Datasetmentioning
confidence: 99%
“…However, in many real-world applications, the sequence-level spotting results are the most urgently needed for users, while it's not what the user cares about for the framewise recognition results. Therefore, we propose the sequence-level evaluation protocals to evaluate the end-to-end performance, i.e., Recall s , Precision s , F-score s as used in [4]. Here, a predicted text sequence is regarded as a true postive if and only if it satisfies two constraints:…”
Section: Task 3-end-to-end Video Text Spottingmentioning
confidence: 99%
See 2 more Smart Citations
“…Yu et al [21] try to learn the feature embedding in an online association manner. Cheng et al [22] and [23] propose a tracker for the text instances by using the metric-learning method. For these tracking-by-detection approaches, the association step fully depends on the first step, i.e., detecting text in the spatial dimension.…”
Section: Introductionmentioning
confidence: 99%