Web Table Extraction, Retrieval and Augmentation

Zhang, Shuo; Balog, Krisztian

doi:10.1145/3331184.3331385

Cited by 33 publications

(53 citation statements)

References 64 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…After locating the table on the webpage, we must further identify the validity of the table [33], [34]. In addition to displaying data and information in tabular form, tables in web pages can also be used to generate layouts and show effects.…”

Section: B Extracting Table Data From Web Pagesmentioning

confidence: 99%

Enhanced Natural Language Interface for Web-Based Information Retrieval

et al. 2021

View full text Add to dashboard Cite

Database application is at the core of most web application systems such as web-based email, source codes repository management, public scientific data repository management, news portals, and publication repository of various fields. However, the usage of these database systems for data and information retrieval is severely limited because of lacking support for processing search queries expressed in a natural language (NL). Most web interfaces for databases today only take search queries entered in some form of logical combination of keywords or text strings, which restrict the scope and depth of what a web user really wants to search for, even though natural language based data or information retrieval has made significant advances in recent years. To overcome or at least to alleviate such limitation in web information services, we propose in this article an improved neural model based on an existing framework IRNet for NL query of databases, in which a representation of Gated Graph Neural Network (GGNN) is introduced to encode the database entities and relations. We also represent and use the database values in the prediction model to identify and match table and column names for automatic synthesize a correct SQL statement from a query expressed in a NL sentence. Experiments with a public dataset demonstrates the promising potential of our approach.INDEX TERMS Neural network, natural language processing, text-to-SQL, gated graph neural network.

show abstract

Section: B Extracting Table Data From Web Pagesmentioning

confidence: 99%

Enhanced Natural Language Interface for Web-Based Information Retrieval

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Upfront, it is unclear whether Capacity refers to Team or Stadium and whether Value is a property of Team, Stadium or Coach. In [7] we have devised a novel solution for this column alignment problem, with much higher precision than prior baselines [3,11,18]. In a nutshell, we compute co-occurrence scores for entity-quantity pairs for candidate alignments, aggregating over the rows of the two columns.…”

Section: System Overviewmentioning

confidence: 99%

“…Entity-centric knowledge extraction from web tables has been intensively explored (see, e.g., [2,3,9,11,18]). However, a common assumption has been that each table has a single subject column to which all other columns refer.…”

Section: Related Workmentioning

confidence: 99%

QuTE: Answering Quantity Queries from Web Tables

Pal

Weikum

2021

Proceedings of the 2021 International Conference on Management of Data

View full text Add to dashboard Cite

Quantities are financial, technological, physical and other measures that denote relevant properties of entities, such as revenue of companies, energy efficiency of cars or distance and brightness of stars and galaxies. Queries with filter conditions on quantities are an important building block for downstream analytics, and pose challenges when the content of interest is spread across a huge number of web tables and other ad-hoc datasets. Search engines support quantity lookups, but largely fail on quantity filters. The QuTE system presented in this paper aims to overcome these problems. It comprises methods for automatically extracting entity-quantity facts from web tables, as well as methods for online query processing, with new techniques for query matching and answer ranking.

show abstract

“…Modern approaches to the wide range of tasks based on structured-data (e.g. table retrieval [7,41], table classification [9], question answering [12]) now propose to leverage progress in deep learning to represent these data into a semantic vector space (also called embedding space). In parallel, an emerging task, called "data-to-text", aims at describing structured data into a natural language description.…”

Section: Related Workmentioning

confidence: 99%

A Hierarchical Model for Data-to-Text Generation

Rebuffel

Soulier

Scoutheeten

et al. 2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Transcribing structured data into natural language descriptions has emerged as a challenging task, referred to as "data-to-text". These structures generally regroup multiple elements, as well as their attributes. Most attempts rely on translation encoder-decoder methods which linearize elements into a sequence. This however loses most of the structure contained in the data. In this work, we propose to overpass this limitation with a hierarchical model that encodes the data-structure at the element-level and the structure level. Evaluations on RotoWire show the effectiveness of our model w.r.t. qualitative and quantitative metrics.

show abstract

Web Table Extraction, Retrieval and Augmentation

Cited by 33 publications

References 64 publications

Enhanced Natural Language Interface for Web-Based Information Retrieval

Enhanced Natural Language Interface for Web-Based Information Retrieval

QuTE: Answering Quantity Queries from Web Tables

A Hierarchical Model for Data-to-Text Generation

Contact Info

Product

Resources

About