GLASS: Global to Local Attention for Scene-Text Spotting

Ronen, Roi; Tsiper, Shahar; Anschel, Oron; Lavi, Inbal; Markovitz, Amir; Manmatha, R.

doi:10.1007/978-3-031-19815-1_15

Cited by 18 publications

(8 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On Total-Text, we surpass all current state-of-the-art in both settings. Some of these prior arts [21,49,67] fine-tune their models on Total-Text which boosts the performance on this target dataset at the cost of dropping performance on others. Also note that, some prior arts [21,22,26,44] limit recognition to case-insensitive letters and no punctuation symbols, while ours operate in a case-sensitive mode, a more difficult but more important one.…”

Section: Comparison With State-of-the-art Resultsmentioning

confidence: 99%

“…Text detection stage produces bounding polygons or rotated bounding boxes for text instances at one granularity, usually words. Text instances are cropped from input image pixels [4], encoded backbone features [26,45], or both [49]. The text recognition stage decodes the text transcription.…”

Section: Related Workmentioning

confidence: 99%

“…The extraction and comprehension of text in images play a critical role in many computer vision applications. Text spotting algorithms have progressed significantly in recent years [33,42,45,49,67], specifically within the task of detecting [2,28,36,63] and recognizing [5,12,40,41,59] individual text instances in images. Previously, defining the geometric layout [7,9,24,62] of extracted textual content occurred independent of text spotting and remained focused on document images.…”

Section: Introductionmentioning

confidence: 99%

“…Existing text spotting methods [45,49,67] most commonly extract text at the word level, where 'word' is defined as a sequence of characters delimited by space without taking into account the text context. Recently, the Unified Detector [34], which is built upon detection transformer [58], detects text 'lines' with instance segmentation mask and produces an affinity matrix for paragraph grouping in an tion (HTR) of text entities in images.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Towards End-to-End Unified Scene Text Detection and Layout Analysis

Qin

Panteleev

Bissacco

et al. 2022

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

We organize a competition on hierarchical text detection and recognition. The competition is aimed to promote research into deep learning models and systems that can jointly perform text detection and recognition and geometric layout analysis. We present details of the proposed competition organization, including tasks, datasets, evaluations, and schedule. During the competition period (from January 2nd 2023 to April 1st 2023), at least 50 submissions from more than 20 teams were made in the 2 proposed tasks. Considering the number of teams and submissions, we conclude that the HierText competition has been successfully held. In this report, we will also present the competition results and insights from them.

show abstract

Section: Comparison With State-of-the-art Resultsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Towards End-to-End Unified Scene Text Detection and Layout Analysis

Qin

Panteleev

Bissacco

et al. 2022

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

show abstract

“…While there are many advanced OCR technologies that one can apply [7,8,9,10,11], we aim at carrying out all AI computation on-device for better privacy, connection and latency. To build a prototype quickly, our first system is modularized to the following three major components: word detection, word recognition, grouping and ordering.…”

Section: Baseline Ocr Systemmentioning

confidence: 99%

A Multiplexed Network for End-to-End, Multilingual OCR

Huang

Pang

Kovvuri

et al. 2021

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

This paper discusses the challenges of optical character recognition (OCR) on natural scenes, which is harder than OCR on documents due to the wild content and various image backgrounds. We propose to uniformly use word error rates (WER) as a new measurement for evaluating scene-text OCR, both end-to-end (e2e) performance and individual system component performances. Particularly for the e2e metric, we name it DISGO WER as it considers Deletion, Insertion, Substitution, and Grouping/Ordering errors. Finally we propose to utilize the concept of super blocks to automatically compute BLEU scores for e2e OCR machine translation. The small SCUT public test set is used to demonstrate WER performance by a modularized OCR system.

show abstract