Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space

Neelakantan, Arvind; Shankar, Jeevan; Passos, A. M. A. dos; McCallum, Andrew

doi:10.48550/arxiv.1504.06654

Cited by 16 publications

(36 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[12]). It would also be interesting to see the performance of this set-based representation obtained with a multi-sense word embedding method, such as [10].…”

Section: Discussionmentioning

confidence: 99%

Representing Documents and Queries as Sets of Word Embedded Vectors for Information Retrieval

Roy¹,

Ganguly²,

Mitra³

et al. 2016

Preprint

View full text Add to dashboard Cite

A major difficulty in applying word vector embeddings in information retrieval is in devising an effective and efficient strategy for obtaining representations of compound units of text, such as whole documents, (in comparison to the atomic words), for the purpose of indexing and scoring documents. Instead of striving for a suitable method to obtain a single vector representation of a large document of text, we aim to develop a similarity metric that makes use of the similarities between the individual embedded word vectors in a document and a query. More specifically, we represent a document and a query as sets of word vectors, and use a standard notion of similarity measure between these sets, computed as a function of the similarities between each constituent word pair from these sets. We then make use of this similarity measure in combination with standard information retrieval based similarities for document ranking. The results of our initial experimental investigations show that our proposed method improves MAP by up to 5.77%, in comparison to standard text-based language model similarity, on the TREC 6, 7, 8 and Robust ad-hoc test collections. CCS Concepts•Information systems → Content analysis and feature selection;

show abstract

“…[12]). It would also be interesting to see the performance of this set-based representation obtained with a multi-sense word embedding method, such as [10].…”

Section: Discussionmentioning

confidence: 99%

Representing Documents and Queries as Sets of Word Embedded Vectors for Information Retrieval

Roy¹,

Ganguly²,

Mitra³

et al. 2016

Preprint

View full text Add to dashboard Cite

show abstract

“…Li et al shows that embedding that is aware of multiple word senses and provides vectors for each specific sense does improve the performance for some NLP tasks [14]. For this issue, some utilize the local context information and clustering for identifying word sense [42][43][44], some resort to external lexical database for disambiguation [45,15,46,13,47,48], while some combine topic modeling methods with embedding [49][50][51][52]. We adopt the idea of assigning multiple vectors to each node in the graph to represent different roles as well as exploiting local graph structure for the purpose.…”

Section: Related Workmentioning

confidence: 99%

Persona2vec: a flexible multi-role representations learning framework for graphs

Yoon

Yang

Jung

et al. 2021

PeerJ Computer Science

View full text Add to dashboard Cite

Graph embedding techniques, which learn low-dimensional representations of a graph, are achieving state-of-the-art performance in many graph mining tasks. Most existing embedding algorithms assign a single vector to each node, implicitly assuming that a single representation is enough to capture all characteristics of the node. However, across many domains, it is common to observe pervasively overlapping community structure, where most nodes belong to multiple communities, playing different roles depending on the contexts. Here, we propose persona2vec, a graph embedding framework that efficiently learns multiple representations of nodes based on their structural contexts. Using link prediction-based evaluation, we show that our framework is significantly faster than the existing state-of-the-art model while achieving better performance.

show abstract

“…Related Work on Multi-Sense Embedding The idea of multiple aspects is in a way related to the polysemy of words. There have been some studies on inferring multi-sense embeddings of words [2,11,14,26], which aims at inferring multiple embedding vectors for each word. However, the two tasks differ significantly in the following perspectives.…”

Section: Supplementary Materialsmentioning

confidence: 99%

AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks

Shi¹,

Gui²,

Zhu³

et al. 2018

Proceedings of the 2018 SIAM International Conference on Data Mining

View full text Add to dashboard Cite

Heterogeneous information networks (HINs) are ubiquitous in real-world applications. Due to the heterogeneity in HINs, the typed edges may not fully align with each other. In order to capture the semantic subtlety, we propose the concept of aspects with each aspect being a unit representing one underlying semantic facet. Meanwhile, network embedding has emerged as a powerful method for learning network representation, where the learned embedding can be used as features in various downstream applications. Therefore, we are motivated to propose a novel embedding learning framework—ASPEM—to preserve the semantic information in HINs based on multiple aspects. Instead of preserving information of the network in one semantic space, ASPEM encapsulates information regarding each aspect individually. In order to select aspects for embedding purpose, we further devise a solution for ASPEM based on dataset-wide statistics. To corroborate the efficacy of ASPEM, we conducted experiments on two real-words datasets with two types of applications—classification and link prediction. Experiment results demonstrate that ASPEM can outperform baseline network embedding learning methods by considering multiple aspects, where the aspects can be selected from the given HIN in an unsupervised manner.

show abstract

Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space

Cited by 16 publications

References 0 publications

Representing Documents and Queries as Sets of Word Embedded Vectors for Information Retrieval

Representing Documents and Queries as Sets of Word Embedded Vectors for Information Retrieval

Persona2vec: a flexible multi-role representations learning framework for graphs

AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks

Contact Info

Product

Resources

About