Cross-lingual Wikification Using Multilingual Embeddings

Tsai, Chen Tse; Roth, Dan

doi:10.18653/v1/n16-1072

Cited by 97 publications

(133 citation statements)

References 17 publications

(20 reference statements)

Supporting

Mentioning

132

Contrasting

Order By: Relevance

“…Such embedding models enable us to design NED models that capture the contextual information required to address NED. These models are typically based on conventional word embedding models (e.g., skip-gram (Mikolov et al, 2013)) that assign a fixed embedding to each word and entity (Yamada et al, 2016;Fang et al, 2016;Tsai and Roth, 2016;Cao et al, 2017;Ganea and Hofmann, 2017). In this study, we aim to test the effectiveness of the pretrained contextualized embeddings for NED.…”

Section: Background and Related Workmentioning

confidence: 99%

Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation

Yamada¹,

Shindo

Takeda

et al. 2016

Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning

286

457

View full text Add to dashboard Cite

Named Entity Disambiguation (NED) refers to the task of resolving multiple named entity mentions in a document to their correct references in a knowledge base (KB) (e.g., Wikipedia). In this paper, we propose a novel embedding method specifically designed for NED. The proposed method jointly maps words and entities into the same continuous vector space. We extend the skip-gram model by using two models. The KB graph model learns the relatedness of entities using the link structure of the KB, whereas the anchor context model aims to align vectors such that similar words and entities occur close to one another in the vector space by leveraging KB anchors and their context words. By combining contexts based on the proposed embedding with standard NED features, we achieved state-of-theart accuracy of 93.1% on the standard CoNLL dataset and 85.2% on the TAC 2010 dataset.

show abstract

Section: Background and Related Workmentioning

confidence: 99%

Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation

Yamada¹,

Shindo

Takeda

et al. 2016

Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning

286

457

View full text Add to dashboard Cite

show abstract

“…2). SBWES may be used to support many tasks, e.g., computing cross-lingual/multilingual semantic word similarity (Faruqui and Dyer, 2014), learning bilingual word lexicons (Mikolov et al, 2013a;Gouws et al, 2015;, cross-lingual entity linking (Tsai and Roth, 2016), parsing (Guo et al, 2015;Johannsen et al, 2015), machine translation (Zou et al, 2013), or crosslingual information retrieval (Vulić and Moens, 2015;Mitra et al, 2016).…”

Section: Monolingual Vs Bilingualmentioning

confidence: 99%

On the Role of Seed Lexicons in Learning Bilingual Word Embeddings

Vulić¹,

Korhonen²

2016

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

A shared bilingual word embedding space (SBWES) is an indispensable resource in a variety of cross-language NLP and IR tasks. A common approach to the SB-WES induction is to learn a mapping function between monolingual semantic spaces, where the mapping critically relies on a seed word lexicon used in the learning process. In this work, we analyze the importance and properties of seed lexicons for the SBWES induction across different dimensions (i.e., lexicon source, lexicon size, translation method, translation pair reliability). On the basis of our analysis, we propose a simple but effective hybrid bilingual word embedding (BWE) model. This model (HYBWE) learns the mapping between two monolingual embedding spaces using only highly reliable symmetric translation pairs from a seed document-level embedding space. We perform bilingual lexicon learning (BLL) with 3 language pairs and show that by carefully selecting reliable translation pairs our new HYBWE model outperforms benchmarking BWE learning models, all of which use more expensive bilingual signals. Effectively, we demonstrate that a SBWES may be induced by leveraging only a very weak bilingual signal (document alignments) along with monolingual data.

show abstract

“…We use the system proposed in Tsai and Roth (2016), which grounds input strings to the intersection of (the title spaces of) the English and the target language Wikipedias. The only requirement is a multilingual Wikipedia dump and it can be applied to all languages in Wikipedia.…”

Section: Cross-lingual Wikifier Featuresmentioning

confidence: 99%

“…The key contribution of this paper is the development of a method that makes use of crosslingual wikification and entity linking (Tsai and Roth, 2016;Moro et al, 2014) to generate language-independent features for NER, and showing how useful this can be for training NER models with no annotation in the target language. Given a mention (sub-string) from a document written in a foreign language, the goal of cross-lingual wikification is to find the cor- Figure 1: An example of a German sentence.…”

Section: Introductionmentioning

confidence: 99%

Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning

2016

View full text Add to dashboard Cite

Cross-lingual Wikification Using Multilingual Embeddings

Cited by 97 publications

References 17 publications

Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation

Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation

On the Role of Seed Lexicons in Learning Bilingual Word Embeddings

Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning

Contact Info

Product

Resources

About