CNN Based Page Object Detection in Document Images

Yi, Xiaosu; Gao, Liangcai; Liao, Yuan; Zhang, Xiaode; Liu, Runtao; Jiang, Zhuoren

doi:10.1109/icdar.2017.46

Cited by 57 publications

(29 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This model is able to detect four different kinds of page objects: table, bar chart, pie chart and line chart. Xiaohan Yi et al [20] proposed a dynamic programming based region proposal method for page object detection.…”

Section: B Deep Learning Based Methodsmentioning

confidence: 99%

Graphical Object Detection in Document Images

Saha

Mondal

Jawahar

2019

2019 International Conference on Document Analysis and Recognition (ICDAR)

View full text Add to dashboard Cite

Graphical elements: particularly tables and figures contain a visual summary of the most valuable information contained in a document. Therefore, localization of such graphical objects in the document images is the initial step to understand the content of such graphical objects or document images. In this paper, we present a novel end-to-end trainable deep learning based framework to localize graphical objects in the document images called as Graphical Object Detection (GOD). Our framework is data-driven and does not require any heuristics or meta-data to locate graphical objects in the document images. The GOD explores the concept of transfer learning and domain adaptation to handle scarcity of labeled training images for graphical object detection task in the document images. Performance analysis carried out on the various public benchmark data sets: ICDAR-2013, ICDAR-POD 2017 and UNLV shows that our model yields promising results as compared to state-of-the-art techniques.

show abstract

Section: B Deep Learning Based Methodsmentioning

confidence: 99%

Graphical Object Detection in Document Images

Saha

Mondal

Jawahar

2019

2019 International Conference on Document Analysis and Recognition (ICDAR)

View full text Add to dashboard Cite

show abstract

“…Yi et al [10] presented a page object detection method using region proposal CNNs, followed by a custom algorithm to refine proposed regions, and a CNN classifier for object category classification. It first pre-processes the input image by applying a component-based region proposal algorithm customized for document images, which extracts the rough region proposals at the initial stage and prunes them later.…”

Section: Related Workmentioning

confidence: 99%

“…Most of the early approaches heavily relied on heuristics-which are task-specific-and thus fail to generalize to novel scenarios [7,8]. Deep-learning based models have been leveraged for this segmentation in the more recent past [2,[9][10][11][12]. All of these methods involve a significant amount of pre or post-processing based on hand-designed heuristics.…”

Section: Introductionmentioning

confidence: 99%

Fi-Fo Detector: Figure and Formula Detection Using Deformable Networks

et al. 2020

View full text Add to dashboard Cite

We propose a novel hybrid approach that fuses traditional computer vision techniques with deep learning models to detect figures and formulas from document images. The proposed approach first fuses the different computer vision based image representations, i.e., color transform, connected component analysis, and distance transform, termed as Fi-Fo image representation. The Fi-Fo image representation is then fed to deep models for further refined representation-learning for detecting figures and formulas from document images. The proposed approach is evaluated on a publicly available ICDAR-2017 Page Object Detection (POD) dataset and its corrected version. It produces the state-of-the-art results for formula and figure detection in document images with an f1-score of 0.954 and 0.922, respectively. Ablation study results reveal that the Fi-Fo image representation helps in achieving superior performance in comparison to raw image representation. Results also establish that the hybrid approach helps deep models to learn more discriminating and refined features.

show abstract

“…In both papers the number of training items is relatively high and the results are evaluated only considering the accuracy of the model without taking into account the recall. Other authors used Faster R-CNN for page layout identification [18], for comic character face detection [15], and for arrow localization on handwritten industrial inspection sheets [5].…”

Section: Previous Workmentioning

confidence: 99%

Object Detection in Floor Plan Images

Ziran

Marinai

2018

Lecture Notes in Computer Science

View full text Add to dashboard Cite

In this work we investigate the use of deep neural networks for object detection in floor plan images. Object detection is important for understanding floor plans and is a preliminary step for their conversion into other representations. In particular, we evaluate the use of object detection architectures, originally designed and trained to recognize objects in images, for recognizing furniture objects as well as doors and windows in floor plans. Even if the problem is somehow easier than the original one in the case of this research the datasets available are extremely small and therefore the training of deep architectures can be problematic. In addition to the use of object detection architectures for floor plan images, another contribution of this paper is the creation of two datasets that have been used for performing the experiments covering different types of floor plans with different peculiarities.

show abstract

CNN Based Page Object Detection in Document Images

Cited by 57 publications

References 11 publications

Graphical Object Detection in Document Images

Graphical Object Detection in Document Images

Fi-Fo Detector: Figure and Formula Detection Using Deformable Networks

Object Detection in Floor Plan Images

Contact Info

Product

Resources

About