Towards End-to-End Unified Scene Text Detection and Layout Analysis

Qin, Siyang; Panteleev, D. A.; Bissacco, Alessandro; Fujii, Yuki; Raptis, Michalis

doi:10.1109/cvpr52688.2022.00112

Cited by 46 publications

(13 citation statements)

References 100 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We also notice our heuristic-based grouping and ordering does not work well on curved text. Machine learning based grouping and ordering, such as [5,6,21,22] is the solution.…”

Section: E2e Baseline Resultsmentioning

confidence: 99%

“…For example, for layout analysis, we need to evaluate both grouping and ordering for the downstream applications. Hiertext [5] uses PQ for evaluating grouping exclusively, but not ordering. [6] uses three different metrics to capture different angles of the e2e performance: local accuracy (similar to the GO error in Section 3.2 based on leadership), local continuity (similar to ngram precisions in BLEU), and global accuracy (measuring exact block accuracy).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Multiplexed Network for End-to-End, Multilingual OCR

Huang

Pang

Kovvuri

et al. 2021

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

This paper discusses the challenges of optical character recognition (OCR) on natural scenes, which is harder than OCR on documents due to the wild content and various image backgrounds. We propose to uniformly use word error rates (WER) as a new measurement for evaluating scene-text OCR, both end-to-end (e2e) performance and individual system component performances. Particularly for the e2e metric, we name it DISGO WER as it considers Deletion, Insertion, Substitution, and Grouping/Ordering errors. Finally we propose to utilize the concept of super blocks to automatically compute BLEU scores for e2e OCR machine translation. The small SCUT public test set is used to demonstrate WER performance by a modularized OCR system.

show abstract

“…We also notice our heuristic-based grouping and ordering does not work well on curved text. Machine learning based grouping and ordering, such as [5,6,21,22] is the solution.…”

Section: E2e Baseline Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A Multiplexed Network for End-to-End, Multilingual OCR

Huang

Pang

Kovvuri

et al. 2021

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

show abstract

“…STKM [16] is a text knowledge mining model based on self-attention mechanisms for text detection tasks. Long et al [17] propose an end-to-end model that combines scene text detection and visual layout analysis, enhancing text detection performance.…”

Section: Related Workmentioning

confidence: 99%

Intelligent electronic document layout recognition via deep learning

Zuo,

Du,

et al. 2024

Fifteenth International Conference on Graphics and Image Processing (ICGIP 2023)

View full text Add to dashboard Cite

An novel intelligent electronic document layout recognition method via deep learning is proposed. A text detection approach is used to detect the string position along with region, and those adjacent regions are merged based on the distance between text zones, then the document layout style is determined by calculating the match degree between the printed document and the publication template set. The proposed recognition method constructs a electronic document representation tree, the location of the area bounding box is added to the tree. The maximum match distance between the trees is calculated, and is used for judging the document layout based on the structural similarity. Experimental results show that this method can quickly and accurately distinguish electronic document among different layout styles. Users can not only recognize the layout of this printed publication real time, but also find the desired layout style of the printed publication from a large number of printed publication images. The given method could meet different usage needs in practical applications.

show abstract

“…The HA has been proposed and used for many document analysis and recognition tasks, many of them related with full-page, end-to-end training and/or text image recognition [56,33,34,29].…”

Section: Metrics Related With the Hungarian Algorithmmentioning

confidence: 99%

End-to-End Page-Level Assessment of Handwritten Text Recognition

Vidal¹,

Toselli²,

Ríos-Vila³

et al. 2023

Preprint

View full text Add to dashboard Cite

The evaluation of Handwritten Text Recognition (HTR) systems has traditionally used metrics based on the edit distance between HTR and ground truth (GT) transcripts, at both the character and word levels. This is very adequate when the experimental protocol assumes that both GT and HTR text lines are the same, which allows edit distances to be independently computed to each given line. Driven by recent advances in pattern recognition, HTR systems increasingly face the end-to-end page-level transcription of a document, where the precision of locating the different text lines and their corresponding reading order (RO) play a key role. In such a case, the standard metrics do not take into account the inconsistencies that might appear. In this paper, the problem of evaluating HTR systems at the page level is introduced in detail. We analyze the convenience of using a two-fold evaluation, where the transcription accuracy and the RO goodness are considered separately. Different alternatives are proposed, analyzed and empirically compared both through partially simulated and through real, full end-to-end experiments. Results support the validity of the proposed two-fold evaluation approach. An important conclusion is that such an evaluation can be adequately achieved by just two simple and well-known metrics: the Word Error Rate, that takes transcription sequentiality into account, and the here re-formulated Bag of Words Word Error Rate, that ignores order. While the latter directly and very accurately assess intrinsic word recognition errors, the difference between both metrics gracefully correlates with the Spearman's Foot Rule Distance, a metric which explicitly measures RO errors associated with layout analysis flaws.

show abstract

Towards End-to-End Unified Scene Text Detection and Layout Analysis

Cited by 46 publications

References 100 publications

A Multiplexed Network for End-to-End, Multilingual OCR

A Multiplexed Network for End-to-End, Multilingual OCR

Intelligent electronic document layout recognition via deep learning

End-to-End Page-Level Assessment of Handwritten Text Recognition

Contact Info

Product

Resources

About