Scalable Zero-shot Entity Linking with Dense Entity Retrieval

Wu, Ledell; Petroni, Filippo; Josifoski, Martin; Riedel, Sebastian; Zettlemoyer, Luke

doi:10.18653/v1/2020.emnlp-main.519

Cited by 244 publications

(434 citation statements)

References 17 publications

Supporting

Mentioning

371

Contrasting

Order By: Relevance

“…Entities in each claim are identified with BLINK (Wu et al, 2019), a model trained on Wikipedia data that links each entity to its nearest Wikipedia page. BLINK combines a bi-encoder (Urbanek et al, 2019;) that identifies candidates with a cross-encoder that models the interaction between mention context and entity descriptions.…”

Section: Entity Briefsmentioning

confidence: 99%

Generating Fact Checking Briefs

Fan¹,

Piktus²,

Petroni³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Self Cite

View full text Add to dashboard Cite

Fact checking at scale is difficult-while the number of active fact checking websites is growing, it remains too small for the needs of the contemporary media ecosystem. However, despite good intentions, contributions from volunteers are often error-prone, and thus in practice restricted to claim detection. We investigate how to increase the accuracy and efficiency of fact checking by providing information about the claim before performing the check, in the form of natural language briefs. We investigate passage-based briefs, containing a relevant passage from Wikipedia, entitycentric ones consisting of Wikipedia pages of mentioned entities, and Question-Answering Briefs, with questions decomposing the claim, and their answers. To produce QABriefs, we develop QABRIEFER, a model that generates a set of questions conditioned on the claim, searches the web for evidence, and generates answers. To train its components, we introduce QABRIEFDATASET which we collected via crowdsourcing. We show that fact checking with briefs-in particular QABriefs-increases the accuracy of crowdworkers by 10% while slightly decreasing the time taken. For volunteer (unpaid) fact checkers, QABriefs slightly increase accuracy and reduce the time required by around 20%.

show abstract

Section: Entity Briefsmentioning

confidence: 99%

Generating Fact Checking Briefs

Fan¹,

Piktus²,

Petroni³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…We observed improvements on most frequency buckets compared to DE R@1, which suggests that the model's few-shot capability can be improved by cross-lingual reading-comprehension. This also offers an initial multilingual validation of a similar two-step BERT-based approach recently introduced in a monolingual setting by (Wu et al, 2019), and provides a strong baseline for future work.…”

Section: Outcomementioning

confidence: 92%

“…Basing entity representations on features of their Wikipedia pages has been a common approach in EL (e.g. Sil and Florian, 2016;Francis-Landau et al, 2016;Gillick et al, 2019;Wu et al, 2019), but we will need to generalize this to include multiple Wikipedia pages with possibly redundant features in many languages.…”

Section: Mel With Wikidata and Wikipediamentioning

confidence: 99%

See 1 more Smart Citation

Entity Linking in 100 Languages

Botha¹,

Shan²,

Gillick³

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

120

View full text Add to dashboard Cite

We propose a new formulation for multilingual entity linking, where language-specific mentions resolve to a language-agnostic Knowledge Base. We train a dual encoder in this new setting, building on prior work with improved feature representation, negative mining, and an auxiliary entity-pairing task, to obtain a single entity retrieval model that covers 100+ languages and 20 million entities. The model outperforms state-of-the-art results from a far more limited cross-lingual linking task. Rare entities and low-resource languages pose challenges at this large-scale, so we advocate for an increased focus on zero-and few-shot evaluation. To this end, we provide Mewsli-9, a large new multilingual dataset 1 matched to our setting, and show how frequency-based analysis provided key insights for our model and training enhancements.

show abstract

“…By using active sampling, we minimize labeling efforts. TrainX uses transfer learning by leveraging Bi-Encoders (Gillick et al, 2019;Wu et al, 2019;Logeswaran et al, 2019;Humeau et al, 2020) for disambiguation and a kNN-index to retrieve candidate entities within milliseconds. We mitigate issues caused by sparse training data by using zero-shot optimized techniques that can generalize beyond the labels seen in training.…”

Section: Contributionmentioning

confidence: 99%

TrainX – Named Entity Linking with Active Sampling and Bi-Encoders

Oberhauser

Bischoff²,

Brendel³

et al. 2020

Proceedings of the 28th International Conference on Computational Linguistics: System Demonstrations

View full text Add to dashboard Cite

We demonstrate TrainX, a system for Named Entity Linking for medical experts. It combines state-of-the-art entity recognition and linking architectures, such as Flair and fine-tuned Bi-Encoders based on BERT, with an easy-to-use interface for healthcare professionals. We support medical experts in annotating training data by using active sampling strategies to forward informative samples to the annotator. We demonstrate that our model is capable of linking against large knowledge bases, such as UMLS (3.6 million entities), and supporting zero-shot cases, where the linker has never seen the entity before. Those zero-shot capabilities help to mitigate the problem of rare and expensive training data that is a common issue in the medical domain.

show abstract

Scalable Zero-shot Entity Linking with Dense Entity Retrieval

Cited by 244 publications

References 17 publications

Generating Fact Checking Briefs

Generating Fact Checking Briefs

Entity Linking in 100 Languages

TrainX – Named Entity Linking with Active Sampling and Bi-Encoders

Contact Info

Product

Resources

About