Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019
DOI: 10.18653/v1/p19-1187
|View full text |Cite
|
Sign up to set email alerts
|

Boosting Entity Linking Performance by Leveraging Unlabeled Documents

Abstract: Modern entity linking systems rely on large collections of documents specifically annotated for the task (e.g., AIDA CoNLL). In contrast, we propose an approach which exploits only naturally occurring information: unlabeled documents and Wikipedia. Our approach consists of two stages. First, we construct a high recall list of candidate entities for each mention in an unlabeled document. Second, we use the candidate lists as weak supervision to constrain our document-level entity linking model. The model treats… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
53
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 42 publications
(53 citation statements)
references
References 14 publications
0
53
0
Order By: Relevance
“…Our model is competitive with Plato in the semi-supervised setting, which additionally uses 50 million documents as unlabeled data. Le and Titov (2019)'s setting is quite different from ours that their model is a global model (requires document input) and trained on Wikipedia and 30k newswire documents from the Reuters RCV1 corpus (Lewis et al 2004). Their model is potentially trained on domain-specific data since the CoNLL-YAGO dataset is derived from the RCV1 corpus.…”
Section: Resultsmentioning
confidence: 99%
“…Our model is competitive with Plato in the semi-supervised setting, which additionally uses 50 million documents as unlabeled data. Le and Titov (2019)'s setting is quite different from ours that their model is a global model (requires document input) and trained on Wikipedia and 30k newswire documents from the Reuters RCV1 corpus (Lewis et al 2004). Their model is potentially trained on domain-specific data since the CoNLL-YAGO dataset is derived from the RCV1 corpus.…”
Section: Resultsmentioning
confidence: 99%
“…the traditional methods on standard benchmark (e.g., AIDA-CoNLL). A line of follow-up work (Le and Titov 2018;2019a;2019b) investigate potential improvement solution or other task settings based on that.…”
Section: Introductionmentioning
confidence: 99%
“…It is clear, that the above-mentioned methods cannot guarantee correct labelling of the samples, however, such imperfect data can still be used in weak supervision. This strategy is used extensively for named entity recognition [20], relation extraction [21], [22], entity linking [23] and text classification [24]. As weak supervision can introduce different types of noise into a model, in our research to infer the sense label of the unannotated sample, we combined the predicted class probabilities of the three weakly supervised models alongside uncertainty estimation.…”
Section: Related Workmentioning
confidence: 99%