Post-OCR Paragraph Recognition by Graph Convolutional Networks

Wang, Renshen; Fujii, Yuki; Popat, Ashok C.

doi:10.1109/wacv51458.2022.00259

Cited by 19 publications

(21 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These methods fail to produce word or line level detections and can only be used in company with standalone 1,000 0 500 4.4/6.5K Total-Text [10] 1,255 0 300 7.4/11K CTW1500 [60] 1,000 0 500 6.7/10K MSRA-TD500 [59] 300 0 200 6.9/3.5K IC17 MLT [38] 7,200 1,800 9,000 9.5/85K IC19 MLT [37] 10,000 0 10,000 8.9/89K IC19 LSVT [49] 30,000 0 20,000 8.1/243K IC19 ArT [11] 5,603 0 4,563 8.9/50K TextOCR [48] 21,778 3,124 3,232 32.1/903K Intel OCR [22] 191,059 text detectors, increasing the complexity of the pipeline. Another branch of work [54] takes a hierarchical view and apply graph-based models on the finest granularity, i.e. individual words, to analyze the layout.…”

Section: Layout Analysismentioning

confidence: 99%

“…We therefore carefully select the following baselines representing non-end-to-end methods: Commercial solution: The GCP API, as mentioned above, is a commercial solution that produces text detection and recognition results at word, line and paragraph level. GCN Post-Processing: The GCN [20] based postprocessing method (GCN-PP) [54] applies the GCN on text line bounding boxes to cluster lines into paragraphs. Object detection baselines: PubLayNet [62] formulates the layout analysis as an instance segmentation task predicting text clusters as pixel masks.…”

Section: Baselinesmentioning

confidence: 99%

“…For MaX-DeepLab-Cluster, we follow the same hyper-parameter and training settings of our unified detector for fair comparison. For GCN-PP, we follow the settings in [54] to train the line clustering model. As mentioned above, these methods can only perform layout analysis based on detected text entities.…”

Section: Experimental Settingsmentioning

confidence: 99%

“…space, as the only supervision signal. Conversely, geometric layout analysis algorithms [2,26,54,58,62] focus on digital documents and either assume word-level text information as given [2,54,58] or directly predict geometric structures without reasoning for their atomic elements [62]. We ask: Can there be a reconciliation of text entity detection and geometric layout analysis?…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Towards End-to-End Unified Scene Text Detection and Layout Analysis

Qin¹,

Panteleev²,

Bissacco³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Scene text detection and document layout analysis have long been treated as two separate tasks in different image domains. In this paper, we bring them together and introduce the task of unified scene text detection and layout analysis. The first hierarchical scene text dataset is introduced to enable this novel research task. We also propose a novel method that is able to simultaneously detect scene text and form text clusters in a unified way. Comprehensive experiments show that our unified model achieves better performance than multiple well-designed baseline methods. Additionally, this model achieves stateof-the-art results on multiple scene text detection datasets without the need of complex post-processing. Dataset and code: https://github.com/google-researchdatasets/hiertext.

show abstract

Section: Layout Analysismentioning

confidence: 99%

Section: Baselinesmentioning

confidence: 99%

Section: Experimental Settingsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Towards End-to-End Unified Scene Text Detection and Layout Analysis

Qin¹,

Panteleev²,

Bissacco³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Graph convolutional networks (GCNs) are becoming a prominent type of neural networks due to their capability of handling non-Euclidean data [4]. They naturally fit many problems in OCR and document analysis, and have been applied to help form lines [5] [6] [7], paragraphs [3] or other types of document entities [8]. Besides the quality gain from these GCN models, another benefit from these approaches is that we can potentially combine all the machine learning tasks and build a single, unified, multi-task GCN model.…”

Section: Introductionmentioning

confidence: 99%

Unified Line and Paragraph Detection by Graph Convolutional Networks

Liu¹,

Wang²,

Raptis³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

We formulate the task of detecting lines and paragraphs in a document into a unified two-level clustering problem. Given a set of text detection boxes that roughly correspond to words, a text line is a cluster of boxes and a paragraph is a cluster of lines. These clusters form a two-level tree that represents a major part of the layout of a document. We use a graph convolutional network to predict the relations between text detection boxes and then build both levels of clusters from these predictions. Experimentally, we demonstrate that the unified approach can be highly efficient while still achieving state-of-the-art quality for detecting paragraphs in public benchmarks and real-world images.

show abstract