Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation

Yamada, Ikuya; Shindo, Haruo; Takeda, Hideaki; Takefuji, Yoshiyasu

doi:10.18653/v1/k16-1025

Cited by 297 publications

(472 citation statements)

References 33 publications

Supporting

Mentioning

457

Contrasting

Unclassified

Order By: Relevance

“…There are two key differences between all these close previous work [3,4,14,25] and ours. First, unlike these past approaches, we tackle the readability of word senses and polysemy problems by learning document representations that leverage semantics inventoried in both text corpora and knowledge resources through fine-grained elements including words and concepts in a joint learning process.…”

Section: Neural Approaches Empowered By Knowledge Resources Forcontrasting

confidence: 65%

“…To tackle the above problems, neural approaches investigated the joint use of both corpus-based word distributions and knowledge resources to achieve more accurate text representations [6,13,14,25].…”

Section: Neural Approaches Empowered By Knowledge Resources Formentioning

confidence: 99%

“…The second and recent line of work aims at refining word embedding using relational constraints to better discriminate word senses by simultaneously learning the concept representations and inferring word senses, and accordingly tackling the polysemy issue [3,4,14,25]. Mancini et al [14] simultaneously learn embeddings for both words and their senses via a semantic network based on the CBOW architecture.…”

Section: Neural Approaches Empowered By Knowledge Resources Formentioning

confidence: 99%

“…Unlikely, Cheng et al [3] assume that polysemy can be captured through context words and therefore propose to compute parallel word-concept skip-grams for each context word by introducing their associated concept in the prediction. In the same mind, Yamada et al [25] propose a Named Entity disambiguation model that exploits word and concept embeddings learned in a two-step methodology. More particularly, word and concept latent spaces are first learned separately in skip-gram frameworks and then are aligned using word-concept anchors derived from the knowledge resource.…”

Section: Neural Approaches Empowered By Knowledge Resources Formentioning

confidence: 99%

See 3 more Smart Citations

A Tri-Partite Neural Document Language Model for Semantic Information Retrieval

et al. 2018

View full text Add to dashboard Cite

Abstract. Previous work in information retrieval have shown that using evidence, such as concepts and relations, from external knowledge resources could enhance the retrieval performance. Recently, deep neural approaches have emerged as state-of-the art models for capturing word semantics that can also be efficiently injected in IR models. This paper presents a new tri-partite neural document language framework that leverages explicit knowledge to jointly constrain word, concept, and document learning representations to tackle a number of issues including polysemy and granularity mismatch. We show the effectiveness of the framework in various IR tasks including document similarity, document re-ranking, and query expansion.

show abstract

Section: Neural Approaches Empowered By Knowledge Resources Forcontrasting

confidence: 65%

Section: Neural Approaches Empowered By Knowledge Resources Formentioning

confidence: 99%

Section: Neural Approaches Empowered By Knowledge Resources Formentioning

confidence: 99%

Section: Neural Approaches Empowered By Knowledge Resources Formentioning

confidence: 99%

See 2 more Smart Citations

A Tri-Partite Neural Document Language Model for Semantic Information Retrieval

et al. 2018

View full text Add to dashboard Cite

show abstract

“…They further extended the model by integrating category structure to capture meaningful semantic relationships between entities and categories. Yamada et al (2016) learned joint embedding for words and entities. Tsai and Roth (2016) proposed a way to learn multilingual embedding of words and entities.…”

Section: Entity Embeddingmentioning

confidence: 99%

Cross-language Article Linking Using Cross-Encyclopedia Entity Embedding

Tsai²

2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

View full text Add to dashboard Cite

Cross-language article linking (CLAL) is the task of finding corresponding article pairs of different languages across encyclopedias. This task is a difficult disambiguation problem in which one article must be selected among several candidate articles with similar titles and contents. Existing works focus on engineering text-based or link-based features for this task, which is a time-consuming job, and some of these features are only applicable within the same encyclopedia. In this paper, we address these problems by proposing crossencyclopedia entity embedding. Unlike other works, our proposed method does not rely on known cross-language pairs. We apply our method to CLAL between English Wikipedia and Chinese Baidu Baike. Our features improve performance relative to the baseline by 29.62%. Tested 30 times, our system achieved an average improvement of 2.76% over the current best system (26.86% over baseline), a statistically significant result.

show abstract

Aligning Knowledge Base and Document Embedding Models Using Regularized Multi-Task Learning

Baumgartner

Zhang

Paudel

et al. 2018

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Knowledge Bases (KBs) and textual documents contain rich and complementary information about real-world objects, as well as relations among them. While text documents describe entities in freeform, KBs organizes such information in a structured way. This makes these two information representation forms hard to compare and integrate, limiting the possibility to use them jointly to improve predictive and analytical tasks. In this article, we study this problem, and we propose KADE, a solution based on a regularized multi-task learning of KB and document embeddings. KADE can potentially incorporate any KB and document embedding learning method. Our experiments on multiple datasets and methods show that KADE effectively aligns document and entities embeddings, while maintaining the characteristics of the embedding models.Abstract. Knowledge Bases (KBs) and textual documents contain rich and complementary information about real-world objects, as well as relations among them. While text documents describe entities in freeform, KBs organizes such information in a structured way. This makes these two information representation forms hard to compare and integrate, limiting the possibility to use them jointly to improve predictive and analytical tasks. In this article, we study this problem, and we propose KADE, a solution based on a regularized multi-task learning of KB and document embeddings. KADE can potentially incorporate any KB and document embedding learning method. Our experiments on multiple datasets and methods show that KADE effectively aligns document and entitie embeddings, while maintaining the characteristics of the embedding models. ⋆ M. Baumgartner, W. Zhang, and B. Paudel contributed equally to this work.when users need open knowledge from different repositories, and when users need to combine open and private knowledge.Key to the success of data integration is the alignment process, i.e. the combination of descriptions that refer to the same real-world object. This is because those descriptions come from data sources that are heterogeneous not only in content, but also in structure (different aspects of an object can be modelled in diverse ways) and format, e.g. relational database, text, sound and images. In this article, we describe the problem of KB entity-document alignment. Different from previous studies, we assume that the same real-world object is described as a KB entity and a text document. Note that the goal is not to align an entity with its surface forms, but rather with a complete document. We move a step towards the solution by using existing embedding models for KBs and documents.A first problem we face in our research is how to enable comparison and contrast of entities and documents. We identify embedding models as a possible solution. These models represent each entity in a KB, or each document in a text corpus, by an embedding, a real-valued vector. Embeddings are represented in vector spaces which preserve some properties, such as similarity. Embeddings gained popularity in a number...

show abstract

Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation

Cited by 297 publications

References 33 publications

A Tri-Partite Neural Document Language Model for Semantic Information Retrieval

A Tri-Partite Neural Document Language Model for Semantic Information Retrieval

Cross-language Article Linking Using Cross-Encyclopedia Entity Embedding

Aligning Knowledge Base and Document Embedding Models Using Regularized Multi-Task Learning

Contact Info

Product

Resources

About