2011
DOI: 10.1007/s00778-011-0223-0
|View full text |Cite
|
Sign up to set email alerts
|

Harvesting relational tables from lists on the web

Abstract: A large number of web pages contain data structured in the form of "lists". Many such lists can be further split into multi-column tables, which can then be used in more semantically meaningful tasks. However, harvesting relational tables from such lists can be a challenging task. The lists are manually generated and hence need not have well defined templates -they have inconsistent delimiters (if any) and often have missing information.We propose a novel technique for extracting tables from lists. The techniq… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
42
0
1

Year Published

2013
2013
2020
2020

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 34 publications
(45 citation statements)
references
References 22 publications
0
42
0
1
Order By: Relevance
“…It is relatively easy to extract data from tables that are encoded using the previous tags. Unfortunately, real-world tables have a variety of intricacies that hamper the extraction process, namely: some tables are encoded using a subset of table-related tags that hardly help locate them and their cells, which does not help interpret them; other tables are encoded using listing tags (ul, ol, dl, li, dd, and dt) [9,20,36,37]; lately, it is also relatively common to find tables that are encoded using block tags (div and span) due to their ability to create responsive layouts [50]; and, generally, speaking, there are many tables that are encoded using a variety of tags that are not actually related to tables, but look like tables when they are displayed [24,26].…”
Section: Table-related Vocabularymentioning
confidence: 99%
See 4 more Smart Citations
“…It is relatively easy to extract data from tables that are encoded using the previous tags. Unfortunately, real-world tables have a variety of intricacies that hamper the extraction process, namely: some tables are encoded using a subset of table-related tags that hardly help locate them and their cells, which does not help interpret them; other tables are encoded using listing tags (ul, ol, dl, li, dd, and dt) [9,20,36,37]; lately, it is also relatively common to find tables that are encoded using block tags (div and span) due to their ability to create responsive layouts [50]; and, generally, speaking, there are many tables that are encoded using a variety of tags that are not actually related to tables, but look like tables when they are displayed [24,26].…”
Section: Table-related Vocabularymentioning
confidence: 99%
“…Unfortunately, roughly 72% of the authors did not report on the effectiveness of their proposals; the others reported on precision, recall, and/or the F 1 score. Only Elmeleegy et al [20] and Chu et al [9] reported on the efficiency of their approaches; their figures reveal that the algorithms behind the scenes might not be scalable enough. Regarding the resources required, only the proposals by Elmeleegy et al [20] and Ling et al [39] require the user to provide a few, but they do not seem to be difficult to find.…”
Section: Segmentationmentioning
confidence: 99%
See 3 more Smart Citations