InfoGather

Yakout, Mohamed; Ganjam, Kris; Chakrabarti, Kaushik; Chaudhuri, Surajit

doi:10.1145/2213836.2213848

Cited by 180 publications

(20 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus, the input to the process is a table and the corresponding output is an enriched table. [145] identified three core tasks in the augmentation of tables.…”

Section: Tabular Searchmentioning

confidence: 99%

“…The values of the most similar table are then used to populate the input table's additional column. The Infogather system [145] uses a similar approach but instead of just calculating the direct similarity between the input table and potential augmenting tables it also takes into the account the neighborhood around the potential augmenting tables. These indirect tables provide ancillary information that can be better suited for augmentation than the tables with the highest similarity to the input tables.…”

Section: Tabular Searchmentioning

confidence: 99%

See 1 more Smart Citation

Dataset search: a survey

et al. 2019

View full text Add to dashboard Cite

Generating value from data requires the ability to find, access and make sense of datasets. There are many efforts underway to encourage data sharing and reuse, from scientific publishers asking authors to submit data alongside manuscripts to data marketplaces, open data portals and data communities. Google recently beta released a search service for datasets, which allows users to discover data stored in various online repositories via keyword queries. These developments foreshadow an emerging research field around dataset search or retrieval that broadly encompasses frameworks, methods and tools that help match a user data need against a collection of datasets. Here, we survey the state of the art of research and commercial systems in dataset retrieval. We identify what makes dataset search a research field in its own right, with unique challenges and methods and highlight open problems. We look at approaches and implementations from related areas dataset search is drawing upon, including information retrieval, databases, entity-centric and tabular search in order to identify possible paths to resolve these open problems as well as immediate next steps that will take the field forward.

show abstract

“…Thus, the input to the process is a table and the corresponding output is an enriched table. [145] identified three core tasks in the augmentation of tables.…”

Section: Tabular Searchmentioning

confidence: 99%

Section: Tabular Searchmentioning

confidence: 99%

Dataset search: a survey

et al. 2019

View full text Add to dashboard Cite

show abstract

“…Before feeding the record sets returned by data extraction into a particular application, it is commonly necessary to perform some of the following integration tasks: semantisation [25,45,54,55,60,63,71], which either maps the descriptors onto the terminology box of a particular ontology or the tuples onto its assertion box [19]; union [23], which merges record sets that provide similar data; finding primary keys [62], which determines which components of the tuples identify them as univocally as possible; record linkage [8,11,12], which finds different records that refer to the same actual entities; augmentation [6,52,67], which joins record sets on the same topic to complete the information that they provide individually; and cleaning [10,31,61], which fixes data. Note that the integration tasks are orthogonal to data extraction because they are independent from the source of the record sets, which is the reason why they fall out of the scope of this article.…”

Section: Data-extraction Vocabularymentioning

confidence: 99%

“…In this context, data extraction consists in transforming tables into structured formats that focus on their data and abstract away from how they are displayed. Data extraction has many applications to text mining [24,64,65], data (meta-)search [3,9,18,26,44,51,[63][64][65], query expansion [16], document summarisation [40,64], question answering [1,20,44,46,65], knowledge discovery [9,22,26,32,44,46], knowledge base construction [17,72], knowledge augmentation [1,9,18,20,56,56,57,67], synonym finding [1,3,39], improving accessibility [43,47,49,64,65], textual advertising [15], data compression [2,49], or creating linked data…”

Section: Introductionmentioning

confidence: 99%

On extracting data from tables that are encoded using HTML

Roldán

Jiménez

Corchuelo

2020

Knowledge-Based Systems

View full text Add to dashboard Cite

Tables are a common means to display data in human-friendly formats. Many authors have worked on proposals to extract those data back since this has many interesting applications. In this article, we summarise and compare many of the proposals to extract data from tables that are encoded using HTML and have been published between 2000 and 2018. We first present a vocabulary that homogenises the terminology used in this field; next, we use it to summarise the proposals; finally, we compare them side by side. Our analysis highlights several challenges to which no proposal provides a conclusive solution and a few more that have not been addressed sufficiently; simply put, no proposal provides a complete solution to the problem, which seems to suggest that this research field shall keep active in the near future. We have also realised that there is no consensus regarding the datasets and the methods used to evaluate the proposals, which hampers comparing the experimental results.

show abstract

“…Table extension and augmentation aims at gathering relational tables that contain the same entities but cover complementary attributes of the entities, and integrate these tables by joining them on the same entities. For example, Yakout et al [38] propose InfoGather for populating a table of entities with their attributes by harvesting related tables on the Web. The users need to either provide the desired attribute names of the entities, or example values of their attributes.…”

Section: General Nlp and Iementioning

confidence: 99%

Effective and efficient Semantic Table Interpretation using TableMiner+

Zhang

2017

101

125

View full text Add to dashboard Cite

Abstract. This article introduces TableMiner + , a Semantic Table Interpretation method that annotates Web tables in a both effective and efficient way. Built on our previous work TableMiner, the extended version advances state-of-the-art in several ways. First, it improves annotation accuracy by making innovative use of various types of contextual information both inside and outside tables as features for inference. Second, it reduces computational overheads by adopting an incremental, bootstrapping approach that starts by creating preliminary and partial annotations of a table using 'sample' data in the table, then using the outcome as 'seed' to guide interpretation of remaining contents. This is then followed by a message passing process that iteratively refines results on the entire table to create the final optimal annotations. Third, it is able to handle all annotation tasks of Semantic Table Interpretation (e.g., annotating a column, or entity cells) while state-of-the-art methods are limited in different ways. We also compile the largest dataset known to date and extensively evaluate TableMiner + against four baselines and two reimplemented (near-identical, as adaptations are needed due to the use of different knowledge bases) state-of-the-art methods. TableMiner + consistently outperforms all models under all experimental settings. On the two most diverse datasets covering multiple domains and various table schemata, it achieves improvement in F1 by between 1 and 42 percentage points depending on specific annotation tasks. It also significantly reduces computational overheads in terms of wall-clock time when compared against classic methods that 'exhaustively' process the entire table content to build features for inference. As a concrete example, compared against a method based on joint inference implemented with parallel computation, the non-parallel implementation of TableMiner + achieves significant improvement in learning accuracy and almost orders of magnitude of savings in wall-clock time.

show abstract

InfoGather

Cited by 180 publications

References 15 publications

Dataset search: a survey

Dataset search: a survey

On extracting data from tables that are encoded using HTML

Effective and efficient Semantic Table Interpretation using TableMiner+

Contact Info

Product

Resources

About