Current Status and Performance Analysis of Table Recognition in Document Images With Deep Neural Networks

Hashmi, Khurram Azeem; Liwicki, Marcus; Stricker, Didier; Afzal, Muhammad Adnan; Afzal, Muhammad Ahtsham; Afzal, Muhammad Zeshan

doi:10.1109/access.2021.3087865

Cited by 47 publications

(40 citation statements)

References 99 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Earlier, most of the proposed methods either relied on custom heuristics or leveraged the external meta-data information to tackle the problem of table detection [22][23][24][25][26]. Later, researchers exploited statistical learning [27] followed by deep-learning-based approaches to alleviate the generalization capabilities of table detection systems [6][7][8][10][11][12][28][29][30][31][32]. This section presents a brief overview of some of these approaches.…”

Section: Related Workmentioning

confidence: 99%

HybridTabNet: Towards Better Table Detection in Scanned Document Images

et al. 2021

Self Cite

View full text Add to dashboard Cite

Tables in document images are an important entity since they contain crucial information. Therefore, accurate table detection can significantly improve the information extraction from documents. In this work, we present a novel end-to-end trainable pipeline, HybridTabNet, for table detection in scanned document images. Our two-stage table detector uses the ResNeXt-101 backbone for feature extraction and Hybrid Task Cascade (HTC) to localize the tables in scanned document images. Moreover, we replace conventional convolutions with deformable convolutions in the backbone network. This enables our network to detect tables of arbitrary layouts precisely. We evaluate our approach comprehensively on ICDAR-13, ICDAR-17 POD, ICDAR-19, TableBank, Marmot, and UNLV. Apart from the ICDAR-17 POD dataset, our proposed HybridTabNet outperformed earlier state-of-the-art results without depending on pre- and post-processing steps. Furthermore, to investigate how the proposed method generalizes unseen data, we conduct an exhaustive leave-one-out-evaluation. In comparison to prior state-of-the-art results, our method reduced the relative error by 27.57% on ICDAR-2019-TrackA-Modern, 42.64% on TableBank (Latex), 41.33% on TableBank (Word), 55.73% on TableBank (Latex + Word), 10% on Marmot, and 9.67% on the UNLV dataset. The achieved results reflect the superior performance of the proposed method.

show abstract

Section: Related Workmentioning

confidence: 99%

HybridTabNet: Towards Better Table Detection in Scanned Document Images

et al. 2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…The IoU threshold values for cascaded bounding boxes are set to [0.5, 0.6, 0.7]. We employed three different anchor ratios of [0.5, 1.0, 2.0] with only one anchor scale of [8] since FPN [24] itself performs the multi-scale detection owing to its top-down architecture. We operated with a batch size of one to train our network.…”

Section: Model Configurationmentioning

confidence: 99%

“…Research in document analysis has been trying to develop precise information extraction systems for several years [1][2][3][4]. Although state-of-the-art optical character recognition (OCR) systems [5,6] recognize regular text with high accuracy, they are vulnerable to recognize information from page objects (tables, figures, mathematical formulas) in document images [7,8]. Figure 1 illustrates the problem in which an open-source OCR, Tesseract [4] (we use the LSTMbased version 4.1.1 available at https://github.com/tesseract-ocr/tesseract accessed on 5 July 2021), is applied to extract the content from a document image.…”

Section: Introductionmentioning

confidence: 99%

Cascade Network with Deformable Composite Backbone for Formula Detection in Scanned Document Images

et al. 2021

Self Cite

View full text Add to dashboard Cite

This paper presents a novel architecture for detecting mathematical formulas in document images, which is an important step for reliable information extraction in several domains. Recently, Cascade Mask R-CNN networks have been introduced to solve object detection in computer vision. In this paper, we suggest a couple of modifications to the existing Cascade Mask R-CNN architecture: First, the proposed network uses deformable convolutions instead of conventional convolutions in the backbone network to spot areas of interest better. Second, it uses a dual backbone of ResNeXt-101, having composite connections at the parallel stages. Finally, our proposed network is end-to-end trainable. We evaluate the proposed approach on the ICDAR-2017 POD and Marmot datasets. The proposed approach demonstrates state-of-the-art performance on ICDAR-2017 POD at a higher IoU threshold with an f1-score of 0.917, reducing the relative error by 7.8%. Moreover, we accomplished correct detection accuracy of 81.3% on embedded formulas on the Marmot dataset, which results in a relative error reduction of 30%.

show abstract

“…Apart from the text, documents contain graphical page objects, such as tables, figures, and formulas [ 1 , 2 ]. Albeit modern Optical Character Recognition (OCR) systems [ 3 , 4 , 5 ] can extract the information from scanned documents, they fail to interpret information from graphical page objects [ 6 , 7 , 8 , 9 ]. Figure 1 exhibits the problem of extracting tabular information from a document by applying open-source Tesseract OCR [ 10 ].…”

Section: Introductionmentioning

confidence: 99%

“…The problem of accurate table detection in document images is still an open problem in the research community [ 8 , 11 , 12 , 13 , 14 ]. The high amount of intra-class variance (arbitraryayouts of tables, varying presence of rulingines) andow amount of inter-class variance (figures, charts, and algorithms equipped with horizontal and verticalines thatookike tables) makes the task of classifying andocalizing tables in document images even more challenging.…”

Section: Introductionmentioning

confidence: 99%

CasTabDetectoRS: Cascade Network for Table Detection in Document Images with Recursive Feature Pyramid and Switchable Atrous Convolution

et al. 2021

Self Cite

View full text Add to dashboard Cite

Table detection is a preliminary step in extracting reliable information from tables in scanned document images. We present CasTabDetectoRS, a novel end-to-end trainable table detection framework that operates on Cascade Mask R-CNN, including Recursive Feature Pyramid network and Switchable Atrous Convolution in the existing backbone architecture. By utilizing a comparativelyightweight backbone of ResNet-50, this paper demonstrates that superior results are attainable without relying on pre- and post-processing methods, heavier backbone networks (ResNet-101, ResNeXt-152), and memory-intensive deformable convolutions. We evaluate the proposed approach on five different publicly available table detection datasets. Our CasTabDetectoRS outperforms the previous state-of-the-art results on four datasets (ICDAR-19, TableBank, UNLV, and Marmot) and accomplishes comparable results on ICDAR-17 POD. Upon comparing with previous state-of-the-art results, we obtain a significant relative error reduction of 56.36%, 20%, 4.5%, and 3.5% on the datasets of ICDAR-19, TableBank, UNLV, and Marmot, respectively. Furthermore, this paper sets a new benchmark by performing exhaustive cross-datasets evaluations to exhibit the generalization capabilities of the proposed method.

show abstract

Current Status and Performance Analysis of Table Recognition in Document Images With Deep Neural Networks

Cited by 47 publications

References 99 publications

HybridTabNet: Towards Better Table Detection in Scanned Document Images

HybridTabNet: Towards Better Table Detection in Scanned Document Images

Cascade Network with Deformable Composite Backbone for Formula Detection in Scanned Document Images

CasTabDetectoRS: Cascade Network for Table Detection in Document Images with Recursive Feature Pyramid and Switchable Atrous Convolution

Contact Info

Product

Resources

About