Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval 2019
DOI: 10.1145/3331184.3331385
|View full text |Cite
|
Sign up to set email alerts
|

Web Table Extraction, Retrieval and Augmentation

Abstract: Tables are a powerful and popular tool for organizing and manipulating data. A vast number of tables can be found on the Web, which represent a valuable knowledge resource. The objective of this survey is to synthesize and present two decades of research on web tables. In particular, we organize existing literature into six main categories of information access tasks : table extraction, table interpretation, table search, question answering, knowledge base augmentation, and table augmentation. For each of the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
51
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 32 publications
(52 citation statements)
references
References 64 publications
(159 reference statements)
0
51
0
1
Order By: Relevance
“…After locating the table on the webpage, we must further identify the validity of the table [33], [34]. In addition to displaying data and information in tabular form, tables in web pages can also be used to generate layouts and show effects.…”
Section: B Extracting Table Data From Web Pagesmentioning
confidence: 99%
“…After locating the table on the webpage, we must further identify the validity of the table [33], [34]. In addition to displaying data and information in tabular form, tables in web pages can also be used to generate layouts and show effects.…”
Section: B Extracting Table Data From Web Pagesmentioning
confidence: 99%
“…Upfront, it is unclear whether Capacity refers to Team or Stadium and whether Value is a property of Team, Stadium or Coach. In [7] we have devised a novel solution for this column alignment problem, with much higher precision than prior baselines [3,11,18]. In a nutshell, we compute co-occurrence scores for entity-quantity pairs for candidate alignments, aggregating over the rows of the two columns.…”
Section: System Overviewmentioning
confidence: 99%
“…Entity-centric knowledge extraction from web tables has been intensively explored (see, e.g., [2,3,9,11,18]). However, a common assumption has been that each table has a single subject column to which all other columns refer.…”
Section: Related Workmentioning
confidence: 99%
“…Modern approaches to the wide range of tasks based on structured-data (e.g. table retrieval [7,41], table classification [9], question answering [12]) now propose to leverage progress in deep learning to represent these data into a semantic vector space (also called embedding space). In parallel, an emerging task, called "data-to-text", aims at describing structured data into a natural language description.…”
Section: Related Workmentioning
confidence: 99%