A Survey of Graphical Page Object Detection with Deep Neural Networks

Bhatt, Jwalin; Hashmi, Khurram Azeem; Afzal, Muhammad Zeshan; Stricker, Didier

doi:10.3390/app11125344

Cited by 30 publications

(29 citation statements)

References 60 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Manual investigation of cross-datasets evaluation yields the misinterpretation of other graphical page objects [ 2 ] with tables. However, with the obtained results, it is evident that our proposed CasTabDetectoRS produces state-of-the-art results on a specific dataset and generalizes well over the other datasets.…”

Section: Resultsmentioning

confidence: 99%

“…Analogous to the field of computer vision, the power of deepearning has made a remarkable impact in the field of table analysis in document images [ 2 , 8 ]. To the best of our knowledge, Hao et al [ 46 ] introduced the idea of implementing Convolutional Neural Network (CNN) to identify spatial features from document images.…”

Section: Related Workmentioning

confidence: 99%

“…The digitization of documents facilitates the process of extracting information without manual intervention. Apart from the text, documents contain graphical page objects, such as tables, figures, and formulas [ 1 , 2 ]. Albeit modern Optical Character Recognition (OCR) systems [ 3 , 4 , 5 ] can extract the information from scanned documents, they fail to interpret information from graphical page objects [ 6 , 7 , 8 , 9 ].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

CasTabDetectoRS: Cascade Network for Table Detection in Document Images with Recursive Feature Pyramid and Switchable Atrous Convolution

et al. 2021

Self Cite

View full text Add to dashboard Cite

Table detection is a preliminary step in extracting reliable information from tables in scanned document images. We present CasTabDetectoRS, a novel end-to-end trainable table detection framework that operates on Cascade Mask R-CNN, including Recursive Feature Pyramid network and Switchable Atrous Convolution in the existing backbone architecture. By utilizing a comparativelyightweight backbone of ResNet-50, this paper demonstrates that superior results are attainable without relying on pre- and post-processing methods, heavier backbone networks (ResNet-101, ResNeXt-152), and memory-intensive deformable convolutions. We evaluate the proposed approach on five different publicly available table detection datasets. Our CasTabDetectoRS outperforms the previous state-of-the-art results on four datasets (ICDAR-19, TableBank, UNLV, and Marmot) and accomplishes comparable results on ICDAR-17 POD. Upon comparing with previous state-of-the-art results, we obtain a significant relative error reduction of 56.36%, 20%, 4.5%, and 3.5% on the datasets of ICDAR-19, TableBank, UNLV, and Marmot, respectively. Furthermore, this paper sets a new benchmark by performing exhaustive cross-datasets evaluations to exhibit the generalization capabilities of the proposed method.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

CasTabDetectoRS: Cascade Network for Table Detection in Document Images with Recursive Feature Pyramid and Switchable Atrous Convolution

et al. 2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…Subsequently, the method distinguished formulas from other graphical page objects such as figures and tables by applying custom heuristics. Succeedingly, researchers have investigated the capabilities of Deep Neural Networks (DNNs) for the problem of formula identification in document images [27,35]. To the best of our knowledge, He et al [36] exploited Convolutional Neural Networks (CNNs) with spatial context to detect mathematical symbols in document images.…”

Section: Related Workmentioning

confidence: 99%

Cascade Network with Deformable Composite Backbone for Formula Detection in Scanned Document Images

et al. 2021

Self Cite

View full text Add to dashboard Cite

This paper presents a novel architecture for detecting mathematical formulas in document images, which is an important step for reliable information extraction in several domains. Recently, Cascade Mask R-CNN networks have been introduced to solve object detection in computer vision. In this paper, we suggest a couple of modifications to the existing Cascade Mask R-CNN architecture: First, the proposed network uses deformable convolutions instead of conventional convolutions in the backbone network to spot areas of interest better. Second, it uses a dual backbone of ResNeXt-101, having composite connections at the parallel stages. Finally, our proposed network is end-to-end trainable. We evaluate the proposed approach on the ICDAR-2017 POD and Marmot datasets. The proposed approach demonstrates state-of-the-art performance on ICDAR-2017 POD at a higher IoU threshold with an f1-score of 0.917, reducing the relative error by 7.8%. Moreover, we accomplished correct detection accuracy of 81.3% on embedded formulas on the Marmot dataset, which results in a relative error reduction of 30%.

show abstract

“…Along with the text, the digital documents contain various graphical page objects, such as tables, figures, and formulas [1]. While state-of-the-art OCR (Optical Character Recognition) [2][3][4] systems can process the raw text in document images, they are vulnerable to extracting information from graphical page objects [5].…”

Section: Introductionmentioning

confidence: 99%

HybridTabNet: Towards Better Table Detection in Scanned Document Images

et al. 2021

Self Cite

View full text Add to dashboard Cite

Tables in document images are an important entity since they contain crucial information. Therefore, accurate table detection can significantly improve the information extraction from documents. In this work, we present a novel end-to-end trainable pipeline, HybridTabNet, for table detection in scanned document images. Our two-stage table detector uses the ResNeXt-101 backbone for feature extraction and Hybrid Task Cascade (HTC) to localize the tables in scanned document images. Moreover, we replace conventional convolutions with deformable convolutions in the backbone network. This enables our network to detect tables of arbitrary layouts precisely. We evaluate our approach comprehensively on ICDAR-13, ICDAR-17 POD, ICDAR-19, TableBank, Marmot, and UNLV. Apart from the ICDAR-17 POD dataset, our proposed HybridTabNet outperformed earlier state-of-the-art results without depending on pre- and post-processing steps. Furthermore, to investigate how the proposed method generalizes unseen data, we conduct an exhaustive leave-one-out-evaluation. In comparison to prior state-of-the-art results, our method reduced the relative error by 27.57% on ICDAR-2019-TrackA-Modern, 42.64% on TableBank (Latex), 41.33% on TableBank (Word), 55.73% on TableBank (Latex + Word), 10% on Marmot, and 9.67% on the UNLV dataset. The achieved results reflect the superior performance of the proposed method.

show abstract

A Survey of Graphical Page Object Detection with Deep Neural Networks

Cited by 30 publications

References 60 publications

CasTabDetectoRS: Cascade Network for Table Detection in Document Images with Recursive Feature Pyramid and Switchable Atrous Convolution

CasTabDetectoRS: Cascade Network for Table Detection in Document Images with Recursive Feature Pyramid and Switchable Atrous Convolution

Cascade Network with Deformable Composite Backbone for Formula Detection in Scanned Document Images

HybridTabNet: Towards Better Table Detection in Scanned Document Images

Contact Info

Product

Resources

About