2009
DOI: 10.14778/1687627.1687749
|View full text |Cite
|
Sign up to set email alerts
|

Harvesting relational tables from lists on the web

Abstract: A large number of web pages contain data structured in the form of "lists". Many such lists can be further split into multi-column tables, which can then be used in more semantically meaningful tasks. However, harvesting relational tables from such lists can be a challenging task. The lists are manually generated and hence need not have well defined templates -they have inconsistent delimiters (if any) and often have missing information.We propose a novel technique for extracting tables from lists. The techniq… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
40
0

Year Published

2012
2012
2019
2019

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 72 publications
(43 citation statements)
references
References 26 publications
(39 reference statements)
3
40
0
Order By: Relevance
“…In recent years, harvesting knowledge from the web [11,24,25,28] has attracted more and more attention. For example, Google's Freebase [1] has collected and published more than 39 million real world entities, with more than 140, 000 attributes.…”
Section: Proceedings Of the Vldbmentioning
confidence: 99%
“…In recent years, harvesting knowledge from the web [11,24,25,28] has attracted more and more attention. For example, Google's Freebase [1] has collected and published more than 39 million real world entities, with more than 140, 000 attributes.…”
Section: Proceedings Of the Vldbmentioning
confidence: 99%
“…One distinguishable feature of our work is the ability to gather and leverage domain knowledge at runtime to automatically tune the integration process. The massive exploitation of the structured Web has been studied for data published in HTML tables and lists [10,20]. However, these works focus on the extraction of rich relational schemas, without addressing the issue of integrating the extracted data.…”
Section: Related Workmentioning
confidence: 99%
“…Traditional IE techniques considered in the database community tend to be source-centric, i.e., they can only be deployed to extract from a specific website or data source. However, a range of domain-independent techniques have emerged recently [2,4,9,10,11,16,20] that seek to look at extraction holistically on the entire Web.…”
Section: Introductionmentioning
confidence: 99%
“…There are some domain-independent efforts, e.g. WebTables [4,10], that extract all simple tables and lists from the Web and store them as relational data. However, domain-independence makes it difficult to attach semantics to the extracted data.…”
Section: Introductionmentioning
confidence: 99%