2015 IEEE/ACM 2nd International Symposium on Big Data Computing (BDC) 2015
DOI: 10.1109/bdc.2015.30
|View full text |Cite
|
Sign up to set email alerts
|

Building the Dresden Web Table Corpus: A Classification Approach

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
59
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 64 publications
(62 citation statements)
references
References 15 publications
0
59
0
Order By: Relevance
“…Unfortunately, there is not a consensus definition in the literature regarding what a table is. Many authors focus on the encoding since they define them as whatever one can encode within HTML table tags [3,7,9,16,18,28,30,32,34,38,47,49,49,59,66,68], which is a pragmatic approach; a few also refer to the display of data, since they define tables as grids in which data are located in cells in a manner that lines and/or styles ease interpreting them [22,24,26,28,30,32,46,49]. There is only a proposal that deviates a little from the previous approaches [21] since the authors focus on the data model behind the tables, independently from how they are displayed; their proposal, however, works on tables in which data are arranged in grids.…”
Section: Table-related Vocabularymentioning
confidence: 99%
See 2 more Smart Citations
“…Unfortunately, there is not a consensus definition in the literature regarding what a table is. Many authors focus on the encoding since they define them as whatever one can encode within HTML table tags [3,7,9,16,18,28,30,32,34,38,47,49,49,59,66,68], which is a pragmatic approach; a few also refer to the display of data, since they define tables as grids in which data are located in cells in a manner that lines and/or styles ease interpreting them [22,24,26,28,30,32,46,49]. There is only a proposal that deviates a little from the previous approaches [21] since the authors focus on the data model behind the tables, independently from how they are displayed; their proposal, however, works on tables in which data are arranged in grids.…”
Section: Table-related Vocabularymentioning
confidence: 99%
“…Most authors differentiate between data tables, which provide data to be extracted, and non-data tables, which are used for layout purposes or to provide utilities. Many of them make also a difference between listings, forms, matrices, and enumerations [16,18,26,30,34,44,46,66], although the exact terminology used is very diverging; there is also a proposal in which tables are classified according to whether they have headers or not [22].…”
Section: Table-related Vocabularymentioning
confidence: 99%
See 1 more Smart Citation
“…Chu et al [13] further introduced syntactic and semantic coherence measures for list extraction, and learned semantic coherence measures by leveraging column co-occurence statistics from existing web tables. Several researchers produced web tables from the public Common Crawl [1, 24,15], thereby making them available to a broad audience outside the large Web companies. Wang, et al [36] improved extraction quality by leveraging curated knowledge bases.…”
Section: Extractionmentioning
confidence: 99%
“…It also discusses heuristics, which are based on features similar to the paper above, to classify Web tables into the proposed taxonomy. In (Eberius et al, 2015), the authors describe the creation of the Dresden Web Table Corpus, by proposing a classification approach that works on the level of different table layout classes.…”
Section: Table Recognition and Layout Discoverymentioning
confidence: 99%