GLASS: Global to Local Attention for Scene-Text Spotting

Ronen, Roi; Tsiper, Shahar; Anschel, Oron; Lavi, Inbal; Markovitz, Amir; Manmatha, R.

doi:10.48550/arxiv.2208.03364

Cited by 2 publications

(4 citation statements)

References 41 publications

(111 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As shown in Table 2, we have summarized various text spotting methods. We see that A3S surpasses the recent competitive methods [1,5,14,15] on most of the metrics. In CTW1500, our method achieves 64.4 and 82.3 for "None" and "Full", respectively.…”

Section: Resultsmentioning

confidence: 80%

“…Similar to CTW1500, we confirm that A3S performs well when unavailable lexicons. The proposed method performs better than GLASS [15], which enhances features before text recognition based on the global information in an input image. It is suggested that not only visual information but also semantic one is effective in scenetext spotting.…”

Section: Resultsmentioning

confidence: 96%

“…In contrast, our work focuses on improving text recognition on scene-text spotting. Although GLASS [15] proposed an attention mechanism recently to fuse visual global and local information for text recognition accuracy, it only relies on vulnerable visual features. Our method exploits the semantic representations of the text to improve recognition.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Temporally-aware Convolutional Block Attention Module for Video Text Detection

Fujitake

Ge²

2021

2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC)

View full text Add to dashboard Cite

Scene-text spotting is a task that predicts a text area on natural scene images and recognizes its text characters simultaneously. It has attracted much attention in recent years due to its wide applications. Existing research has mainly focused on improving text region detection, not text recognition. Thus, while detection accuracy is improved, the end-to-end accuracy is insufficient. Texts in natural scene images tend to not be a random string of characters but a meaningful string of characters, a word. Therefore, we propose adversarial learning of semantic representations for scene text spotting (A3S) to improve end-to-end accuracy, including text recognition. A3S simultaneously predicts semantic features in the detected text area instead of only performing text recognition based on existing visual features. Experimental results on publicly available datasets show that the proposed method achieves better accuracy than other methods.

show abstract

Section: Resultsmentioning

confidence: 80%

Section: Resultsmentioning

confidence: 96%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Temporally-aware Convolutional Block Attention Module for Video Text Detection

Fujitake

Ge²

2021

2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC)

View full text Add to dashboard Cite

show abstract

“…Text detection and recognition systems [11] and geometric layout analysis techniques [12,13] have long been developed separately as independent tasks. Research on text detection and recognition [14,15,16,17] has mainly focused on the domain of natural images and aimed at single level text spotting (mostly, wordlevel). Conversely, research on geometric layout analysis [12,13,18,19], which is targeted at parsing text paragraphs and forming text clusters, has assumed document images as input and taken OCR results as fixed and given by independent systems.…”

Section: Introductionmentioning

confidence: 99%

Towards End-to-End Unified Scene Text Detection and Layout Analysis

Qin

Panteleev

Bissacco

et al. 2022

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

We organize a competition on hierarchical text detection and recognition. The competition is aimed to promote research into deep learning models and systems that can jointly perform text detection and recognition and geometric layout analysis. We present details of the proposed competition organization, including tasks, datasets, evaluations, and schedule. During the competition period (from January 2nd 2023 to April 1st 2023), at least 50 submissions from more than 20 teams were made in the 2 proposed tasks. Considering the number of teams and submissions, we conclude that the HierText competition has been successfully held. In this report, we will also present the competition results and insights from them.

show abstract

GLASS: Global to Local Attention for Scene-Text Spotting

Cited by 2 publications

References 41 publications

Temporally-aware Convolutional Block Attention Module for Video Text Detection

Temporally-aware Convolutional Block Attention Module for Video Text Detection

Towards End-to-End Unified Scene Text Detection and Layout Analysis

Contact Info

Product

Resources

About