2014 22nd International Conference on Pattern Recognition 2014
DOI: 10.1109/icpr.2014.479
|View full text |Cite
|
Sign up to set email alerts
|

Transforming Web Tables to a Relational Database

Abstract: Abstract-HTML tables represent a significant fraction of web data. The often complex headers of such tables are determined accurately using their indexing property. Isolated headers are factored to extract hierarchical categories. Web tables are then transformed into a canonical form and imported into a relational database. The proposed processing allows for the formulation of arbitrary SQL queries over the collection of induced relational tables.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
26
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
4
2
2

Relationship

3
5

Authors

Journals

citations
Cited by 14 publications
(32 citation statements)
references
References 21 publications
0
26
0
Order By: Relevance
“…Chen et al proposed an extraction system to convert spreadsheet data into relational tuples. Authors in [34] investigated the nature of HTML tables from the web. We have provided the comparison of related techniques in Table 2.…”
Section: Schema Extraction and Matchingmentioning
confidence: 99%
“…Chen et al proposed an extraction system to convert spreadsheet data into relational tuples. Authors in [34] investigated the nature of HTML tables from the web. We have provided the comparison of related techniques in Table 2.…”
Section: Schema Extraction and Matchingmentioning
confidence: 99%
“…Table 3 shows the distributions of the 198 non-trivial SAUS row and column header sizes. The data shows that multi-row column headers are more frequent (99) than multi-column row headers (64). The statistics on header sizes, prefixed rows and columns, number of row and column categories, and number of notes rows are based on analysis of the minimal indexing headers found by MIPS that do not depend on subjective interpretation of the table.…”
Section: Html Tables (Troy 200)mentioning
confidence: 99%
“…At the 2014 Document Analysis Systems workshop, we reported on our initial, automatic end-to-end conversion of web tables to relational databases [63]. We showed SQL queries on HTML tables imported into MS-Access at ICPR 2014 [64]. At the 2015 IST/SPIE Conference on Document Recognition and Retrieval, we clustered the headers of category hierarchies to reveal commonalities among tables [65].…”
Section: Our Earlier Workmentioning
confidence: 99%
“…The foundations for analyzing table headers that we developed over the years have been published in a succession of conference papers cited in our report at ICPR 2014 [25] and updated in our recent IJDAR article [1].…”
Section: Prior Workmentioning
confidence: 99%