Robust named entity disambiguation with random walks

Guo, Zhaochen; Barbosa, Denilson

doi:10.3233/sw-170273

Cited by 119 publications

(130 citation statements)

References 27 publications

Supporting

Mentioning

118

Contrasting

Order By: Relevance

“…Accuracy Chisholm and Hachey (2015) 88.7 Guo and Barbosa (2018) 89.0 Globerson et al (2016) 91.0 Yamada et al (2016) 91.5 Ganea and Hofmann (2017) 92.22 ± 0.14 Yang et al (2018) 93.0 Le and Titov (2018) 93.07 ± 0.27 Our 94.0 ± 0.28 Our (+pseudo entities)…”

Section: Methodsmentioning

confidence: 99%

Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation

Yamada¹,

Shindo

Takeda

et al. 2016

Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning

296

457

View full text Add to dashboard Cite

Named Entity Disambiguation (NED) refers to the task of resolving multiple named entity mentions in a document to their correct references in a knowledge base (KB) (e.g., Wikipedia). In this paper, we propose a novel embedding method specifically designed for NED. The proposed method jointly maps words and entities into the same continuous vector space. We extend the skip-gram model by using two models. The KB graph model learns the relatedness of entities using the link structure of the KB, whereas the anchor context model aims to align vectors such that similar words and entities occur close to one another in the vector space by leveraging KB anchors and their context words. By combining contexts based on the proposed embedding with standard NED features, we achieved state-of-theart accuracy of 93.1% on the standard CoNLL dataset and 85.2% on the TAC 2010 dataset.

show abstract

Section: Methodsmentioning

confidence: 99%

Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation

Yamada¹,

Shindo

Takeda

et al. 2016

Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning

296

457

View full text Add to dashboard Cite

show abstract

“…Lazic et al (2015) 86. 4 Huang et al (2015) 86.6 Chisholm and Hachey (2015) 88.7 Ganea et al (2016) 87.6 Guo and Barbosa (2016) 89.0 Globerson et al (2016) 91.0 Yamada et al (2016) 91.5 Ganea and Hofmann (2017) 92.2…”

Section: Systemmentioning

confidence: 99%

Collective Entity Disambiguation with Structured Gradient Tree Boosting

Yang¹,

İrsoy²,

Rahman³

2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

View full text Add to dashboard Cite

We present a gradient-tree-boosting-based structured learning model for jointly disambiguating named entities in a document. Gradient tree boosting is a widely used machine learning algorithm that underlies many topperforming natural language processing systems. Surprisingly, most works limit the use of gradient tree boosting as a tool for regular classification or regression problems, despite the structured nature of language. To the best of our knowledge, our work is the first one that employs the structured gradient tree boosting (SGTB) algorithm for collective entity disambiguation. By defining global features over previous disambiguation decisions and jointly modeling them with local features, our system is able to produce globally optimized entity assignments for mentions in a document. Exact inference is prohibitively expensive for our globally normalized model. To solve this problem, we propose Bidirectional Beam Search with Gold path (BiBSG), an approximate inference algorithm that is a variant of the standard beam search algorithm. BiBSG makes use of global information from both past and future to perform better local search. Experiments on standard benchmark datasets show that SGTB significantly improves upon published results. Specifically, SGTB outperforms the previous state-of-the-art neural system by near 1% absolute accuracy on the popular AIDA-CoNLL dataset.

show abstract

“…We carried out our experiments in the standard setting but used other (unlabeled) data for training, as described below. We used six test sets: AIDA CoNLL 'testb' (Hoffart et al, 2011) (aka AIDA-B); MSNBC, AQUAINT, ACE2004, cleaned and updated by Guo and Barbosa (2016); CWEB, WIKI, automatically extracted from Clueweb (Guo and Barbosa, 2016;Gabrilovich et al, 2013). We use AIDA CoNLL 'testa' data (aka AIDA-A) as our development set (216 documents).…”

Section: Settingmentioning

confidence: 99%

“…We also compare to recent state-of-the-art systems trained supervisedly on Wikipedia and extra supervision or on AIDA CoNLL: Chisholm and Hachey (2015), Guo and Barbosa (2016), Globerson et al (2016), Yamada et al (2016), Ganea and Hofmann (2017), Le and Titov (2018). Chisholm and Hachey (2015) used supervision in the form of links to Wikipedia from non-Wikipedia pages, Wikilinks (Singh et al, 2012)).…”

Section: Settingmentioning

confidence: 99%

Boosting Entity Linking Performance by Leveraging Unlabeled Documents

Le¹,

Titov²

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

Modern entity linking systems rely on large collections of documents specifically annotated for the task (e.g., AIDA CoNLL). In contrast, we propose an approach which exploits only naturally occurring information: unlabeled documents and Wikipedia. Our approach consists of two stages. First, we construct a high recall list of candidate entities for each mention in an unlabeled document. Second, we use the candidate lists as weak supervision to constrain our document-level entity linking model. The model treats entities as latent variables and, when estimated on a collection of unlabelled texts, learns to choose entities relying both on local context of each mention and on coherence with other entities in the document. The resulting approach rivals fully-supervised state-of-the-art systems on standard test sets. It also approaches their performance in the very challenging setting: when tested on a test set sampled from the data used to estimate the supervised systems. By comparing to Wikipedia-only training of our model, we demonstrate that modeling unlabeled documents is beneficial. 1 The best reported in-domain scores are 93.1% F1 (Le and Titov, 2018), whereas the best previous out-of-domain score is only 85.7% F1 (Guo and Barbosa, 2016) (an average over 5 standard out-of-domain test sets, Table 1).

show abstract

Robust named entity disambiguation with random walks

Cited by 119 publications

References 27 publications

Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation

Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation

Collective Entity Disambiguation with Structured Gradient Tree Boosting

Boosting Entity Linking Performance by Leveraging Unlabeled Documents

Contact Info

Product

Resources

About