2020 25th International Conference on Pattern Recognition (ICPR) 2021
DOI: 10.1109/icpr48806.2021.9413069
|View full text |Cite
|
Sign up to set email alerts
|

An Integrated Approach of Deep Learning and Symbolic Analysis for Digital PDF Table Extraction

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 21 publications
0
6
0
Order By: Relevance
“…Our technique also applies to general DSLs in different domains rather than just XPaths for web extraction. Ideas around exploiting compositionality and data invariance have also been explored in previous works: [5,12] use commonly reoccurring phrasal patterns for web extraction given a seed set; in the vision community, modular approaches such as convolutional neural networks have been used for document image extraction [17,34,52,55,58], and notably algorithms based on R-CNN [19] use selective search to focus attention on a small number of regions from the image (region proposals). Our core ideas are similarly based around localised regions, but we detect them by identifying landmarks that present a common kind of invariance in formed documents.…”
Section: Robustness Of Experimental Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Our technique also applies to general DSLs in different domains rather than just XPaths for web extraction. Ideas around exploiting compositionality and data invariance have also been explored in previous works: [5,12] use commonly reoccurring phrasal patterns for web extraction given a seed set; in the vision community, modular approaches such as convolutional neural networks have been used for document image extraction [17,34,52,55,58], and notably algorithms based on R-CNN [19] use selective search to focus attention on a small number of regions from the image (region proposals). Our core ideas are similarly based around localised regions, but we detect them by identifying landmarks that present a common kind of invariance in formed documents.…”
Section: Robustness Of Experimental Resultsmentioning
confidence: 99%
“…While this approach shows improved robustness, it still generates global programs that can fail with irrelevant changes to the document format, and we show in this work how our compositional synthesis approach performs better empirically in practice. There has been very limited work in the area of synthesis for document image extraction, but notable works in specialized areas include [52], where concepts from inductive logic programming are combined with neural approaches, and [58], which combines symbolic reasoning with CNNs, though interpretable programs are not generated.…”
Section: Robustness Of Experimental Resultsmentioning
confidence: 99%
“…Deep learning techniques are now widely used to identify and extract tables in PDF documents [46], [152]. This aspect will be detailed later.…”
Section: B Data Extractionmentioning
confidence: 99%
“…According to Hashmi et al (2021a), on “ICDAR‐2013 Table Competition” dataset (Göbel et al, 2013), F 1 ‐score is close to 1.0 for TD when the threshold of “Intersection Over Union” (IOU) equals 0.5, and this score reaches 0.95 (IOU = 0.5) for TSR. However, the results may be not as good if one uses stricter metrics (Zhang et al, 2021a) or more complicated tables (Zhang et al, 2022). Particularly, this fact is also confirmed by developing the domain‐specific benchmarks (Adams et al, 2021; Desai et al, 2021).…”
Section: Problem Scopementioning
confidence: 99%