2013 12th International Conference on Document Analysis and Recognition 2013
DOI: 10.1109/icdar.2013.181
|View full text |Cite
|
Sign up to set email alerts
|

Segmenting Tables via Indexing of Value Cells by Table Headers

Abstract: Abstract-Correct segmentation of a web table into its component regions is the essential first step to understanding tabular data. Our algorithmic solution to the segmentation problem relies on the property that strings defining row and column header paths uniquely index each data cell in the table. We segment the table using only "logical layout analysis" without resorting to any appearance features or natural language understanding. We start with a CSV

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
3
3
2

Relationship

3
5

Authors

Journals

citations
Cited by 18 publications
(16 citation statements)
references
References 15 publications
(17 reference statements)
0
15
0
Order By: Relevance
“…For those data sets the syntactical structure of the content is more important, as also shown by Cortez et al [5]. Seth et al [19] proposed a reliable approach for table normalization based on sequential circuit analysis, which does not require background knowledge, but only succeeds on tables with unique access path to data cells.…”
Section: Related Workmentioning
confidence: 92%
“…For those data sets the syntactical structure of the content is more important, as also shown by Cortez et al [5]. Seth et al [19] proposed a reliable approach for table normalization based on sequential circuit analysis, which does not require background knowledge, but only succeeds on tables with unique access path to data cells.…”
Section: Related Workmentioning
confidence: 92%
“…Attempts at segmentation using table grammars-syntactic pattern recognition-did not give acceptable results either [ 10 , 11 ]. However, segmentation based on indexing, even though more primitive than our current method, resulted in 98.5% accuracy [12]. The indexing property is fundamental and deserves to be incorporated in any [13,14] Importing and querying visual tables in a Data Base Management System (DBMS) was originally proposed for scanned paper tables [15], and much later for Web tables [16].…”
Section: Previous Workmentioning
confidence: 99%
“…VeriClick, an interactive tool for table segmentation and ground-truthing, was described in [61]. We introduced algorithmic table segmentation, based on the fundamental indexing property, in [62]. Some other conference reports of our experiments on various aspects of table processing are cited in the above publications.…”
Section: Our Earlier Workmentioning
confidence: 99%
“…In addition to the already-mentioned IEA/AIE'11 [60] and ICDAR'13 [62] papers, three precursors to this article have recently appeared in conference proceedings. At the 2014 Document Analysis Systems workshop, we reported on our initial, automatic end-to-end conversion of web tables to relational databases [63].…”
Section: Our Earlier Workmentioning
confidence: 99%