Tough Tables: Carefully Evaluating Entity Linking for Tabular Data

Cutrona, Vincenzo; Bianchi, Federico; Jiménez-Ruiz, Ernesto; Palmonari, Matteo

doi:10.1007/978-3-030-62466-8_21

Cited by 20 publications

(12 citation statements)

References 11 publications

(22 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For this reason, NEST is not expected to improve the results on ST19-R4, making it a good resource to study the possible negative impact of applying NEST to algorithms. • Tough Tables (2T) [3] features ambiguous and noisy tables that resemble real-world cases. 2T has been included in the last round of SemTab 2020, showing that its high ambiguity makes it harder than any other dataset.…”

Section: Datasetsmentioning

confidence: 99%

“…Different benchmarks have shown that algorithms perform linking effectively when tables are small and labels are characterized by no or low ambiguity. However, the performance drops dramatically as soon as labels are ambiguous and the tables dimension increases, thus novel datasets have been developed with the objective of challenging algorithms to properly deal with both ambiguity and large tables [2,3]. Algorithms must tackle these challenges and improve their performance in settings covering relevant application scenarios.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

NEST: Neural Soft Type Constraints to Improve Entity Linking in Tables

Cutrona

Puleri

Bianchi

et al. 2021

Studies on the Semantic Web

Self Cite

View full text Add to dashboard Cite

Matching tables against Knowledge Graphs is a crucial task in many applications. A widely adopted solution to improve the precision of matching algorithms is to refine the set of candidate entities by their type in the Knowledge Graph. However, it is not rare that a type is missing for a given entity. In this paper, we propose a methodology to improve the refinement phase of matching algorithms based on type prediction and soft constraints. We apply our methodology to state-of-the-art algorithms, showing a performance boost on different datasets.

show abstract

Section: Datasetsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

NEST: Neural Soft Type Constraints to Improve Entity Linking in Tables

Cutrona

Puleri

Bianchi

et al. 2021

Studies on the Semantic Web

Self Cite

View full text Add to dashboard Cite

show abstract

“…The target KG in 2019 was DBpedia [9], while in 2020 was Wikidata [10]. A new gold standard, Tough Tables (2T) [2], was also introduced during SemTab 2020 Round4. In the context of the SemTab 2020 challenge, the table corpora are significantly large with thousands of tables and cells to annotate (cf.…”

Section: Gold Standardsmentioning

confidence: 99%

A Framework for Quality Assessment of Semantic Annotations of Tabular Data

Avogadro

Cremaschi

Jiménez-Ruiz

et al. 2021

The Semantic Web – ISWC 2021

Self Cite

View full text Add to dashboard Cite

Much information is conveyed within tables, which can be semantically annotated by humans or (semi)automatic approaches. Nevertheless, many applications cannot take full advantage of semantic annotations because of the low quality. A few methodologies exist for the quality assessment of semantic annotation of tabular data, but they do not automatically assess the quality as a multidimensional concept through different quality dimensions. The quality dimensions are implemented in STILTool 2, a web application to automate the quality assessment of the annotations. The evaluation is carried out by comparing the quality of semantic annotations with gold standards. The work presented here has been applied to at least three use cases. The results show that our approach can give us hints about the quality issues and how to address them.

show abstract

“…However, tables extracted from HTML pages on the Web (Web tables) provide at best a skewed representation of tables in the wild residing in databases, particularly enterprise databases [23,10,20]. For example, the semantic type id or identifier does not even appear among the twenty most frequent headers of WebTables [38], the largest table corpus to date.…”

Section: Introductionmentioning

confidence: 99%

GitTables: A Large-Scale Corpus of Relational Tables

Hulsebos¹,

Demiralp²,

Groth³

2021

Preprint

View full text Add to dashboard Cite

The practical success of deep learning has sparked interest in improving relational table tasks, like data search, with models trained on large table corpora. Existing corpora primarily contain tables extracted from HTML pages, limiting the capability to represent offline database tables. To train and evaluate high-capacity models for applications beyond the Web, we need additional resources with tables that resemble relational database tables.Here we introduce GitTables, a corpus of currently 1.7M relational tables extracted from GitHub. Our continuing curation aims at growing the corpus to at least 20M tables. We annotate table columns in GitTables with more than 2K different semantic types from Schema.org and DBpedia. Our column annotations consist of semantic types, hierarchical relations, range types and descriptions. The corpus is available at https://gittables.github.io. Our analysis of GitTables shows that its structure, content, and topical coverage differ significantly from existing table corpora. We evaluate our annotation pipeline on hand-labeled tables from the T2Dv2 benchmark and find that our approach provides results on par with human annotations. We demonstrate a use case of GitTables by training a semantic type detection model on it and obtain high prediction accuracy. We also show that the same model trained on tables from the Web generalizes poorly.Preprint. Under review.

show abstract

Tough Tables: Carefully Evaluating Entity Linking for Tabular Data

Cited by 20 publications

References 11 publications

NEST: Neural Soft Type Constraints to Improve Entity Linking in Tables

NEST: Neural Soft Type Constraints to Improve Entity Linking in Tables

A Framework for Quality Assessment of Semantic Annotations of Tabular Data

GitTables: A Large-Scale Corpus of Relational Tables

Contact Info

Product

Resources

About