2020
DOI: 10.1007/978-3-030-62466-8_21
|View full text |Cite
|
Sign up to set email alerts
|

Tough Tables: Carefully Evaluating Entity Linking for Tabular Data

Abstract: Table annotation is a key task to improve querying the Web and support the Knowledge Graph population from legacy sources (tables). Last year, the SemTab challenge was introduced to unify different efforts to evaluate table annotation algorithms by providing a common interface and several general-purpose datasets as a ground truth. The SemTab dataset is useful to have a general understanding of how these algorithms work, and the organizers of the challenge included some artificial noise to the data to make the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 20 publications
(12 citation statements)
references
References 11 publications
(22 reference statements)
0
12
0
Order By: Relevance
“…For this reason, NEST is not expected to improve the results on ST19-R4, making it a good resource to study the possible negative impact of applying NEST to algorithms. • Tough Tables (2T) [3] features ambiguous and noisy tables that resemble real-world cases. 2T has been included in the last round of SemTab 2020, showing that its high ambiguity makes it harder than any other dataset.…”
Section: Datasetsmentioning
confidence: 99%
See 1 more Smart Citation
“…For this reason, NEST is not expected to improve the results on ST19-R4, making it a good resource to study the possible negative impact of applying NEST to algorithms. • Tough Tables (2T) [3] features ambiguous and noisy tables that resemble real-world cases. 2T has been included in the last round of SemTab 2020, showing that its high ambiguity makes it harder than any other dataset.…”
Section: Datasetsmentioning
confidence: 99%
“…Different benchmarks have shown that algorithms perform linking effectively when tables are small and labels are characterized by no or low ambiguity. However, the performance drops dramatically as soon as labels are ambiguous and the tables dimension increases, thus novel datasets have been developed with the objective of challenging algorithms to properly deal with both ambiguity and large tables [2,3]. Algorithms must tackle these challenges and improve their performance in settings covering relevant application scenarios.…”
Section: Introductionmentioning
confidence: 99%
“…The target KG in 2019 was DBpedia [9], while in 2020 was Wikidata [10]. A new gold standard, Tough Tables (2T) [2], was also introduced during SemTab 2020 Round4. In the context of the SemTab 2020 challenge, the table corpora are significantly large with thousands of tables and cells to annotate (cf.…”
Section: Gold Standardsmentioning
confidence: 99%
“…However, tables extracted from HTML pages on the Web (Web tables) provide at best a skewed representation of tables in the wild residing in databases, particularly enterprise databases [23,10,20]. For example, the semantic type id or identifier does not even appear among the twenty most frequent headers of WebTables [38], the largest table corpus to date.…”
Section: Introductionmentioning
confidence: 99%