Learning Word Representations by Jointly Modeling Syntagmatic and Paradigmatic Relations

Sun, Fei; Guo, Jiafeng; Lan, Yanyan; Xu, Jun; Cheng, Xueqi

doi:10.3115/v1/p15-1014

Cited by 42 publications

(37 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Results of the word analogy test are often accompanied by a visualisation of projected word vectors to the two dimensional plane using Principal Component Analysis (PCA) ( (Mikolov et al, 2013a;Sun et al, 2015) for example). Though these are generally not claimed to be part of the evaluation, the visualisations are included to convince the reader of the quality of the word embeddings with respect to word analogies-the line connecting a 1 and b 1 being approximately parallel to the line through a 2 and b 2 whenever word analogy recovery is optimal (as in Equation ( 2)).…”

Section: Pca To Two Dimensions From Dimension D Can Be Misleadingmentioning

confidence: 99%

The Word Analogy Testing Caveat

Schluter¹

2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

View full text Add to dashboard Cite

There are some important problems in the evaluation of word embeddings using standard word analogy tests. In particular, in virtue of the assumptions made by systems generating the embeddings, these remain tests over randomness. We show that even supposing there were such word analogy regularities that should be detected in the word embeddings obtained via unsupervised means, standard word analogy test implementation practices provide distorted or contrived results. We raise concerns regarding the use of Principal Component Analysis to 2 or 3 dimensions as a provision of visual evidence for the existence of word analogy relations in embeddings. Finally, we propose some solutions to these problems.

show abstract

Section: Pca To Two Dimensions From Dimension D Can Be Misleadingmentioning

confidence: 99%

The Word Analogy Testing Caveat

Schluter¹

2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

View full text Add to dashboard Cite

show abstract

“…To approximate novelty, we use word embeddings (computed over the OMCS corpus) to calculate distance d(a, b) = ||head(a) − head(b)|| 2 + ||tail(a) − tail(b)|| 2 , where head and tail are X X X X X X X X represented by the average of word embeddings. Such a formulation is related to the concept of paradigmatic similarity (Sahlgren, 2006), and word embedding-based distance can approximate paradigmatic similarity (Sun et al, 2015). Two words are paradigmatically similar if one can be replaced for the other, while maintaining syntactical correctness of the sentence (e.g.…”

Section: Automatically Measuring Noveltymentioning

confidence: 99%

Commonsense mining as knowledge base completion? A study on the impact of novelty

Jastrzębski¹,

Bahdanau²,

Hosseini³

et al. 2018

Proceedings of the Workshop on Generalization in the Age of Deep Learning

View full text Add to dashboard Cite

Commonsense knowledge bases such as Con-ceptNet represent knowledge in the form of relational triples. Inspired by the recent work by Li et al. (2016), we analyse if knowledge base completion models can be used to mine commonsense knowledge from raw text. We propose novelty of predicted triples with respect to the training set as an important factor in interpreting results. We critically analyse the difficulty of mining novel commonsense knowledge, and show that a simple baseline method outperforms the previous state of the art on predicting more novel triples.

show abstract

“…For some tasks, the evaluation is directly conducted over the embedding (e.g., measuring the cosine similarity between word vectors); whereas for others, a classifier is Pre-trained Embedding. We perform experiments with the GloVe embedding (Pennington, Socher, and Manning 2014) and the HDC embedding (Sun et al 2015). The GloVe embedding is trained from 42B tokens of Common Crawl data.…”

Section: Resultsmentioning

confidence: 99%

Embedding Compression with Isotropic Iterative Quantization

Liao

Chen

Wang

et al. 2020

AAAI

View full text Add to dashboard Cite

Continuous representation of words is a standard component in deep learning-based NLP models. However, representing a large vocabulary requires significant memory, which can cause problems, particularly on resource-constrained platforms. Therefore, in this paper we propose an isotropic iterative quantization (IIQ) approach for compressing embedding vectors into binary ones, leveraging the iterative quantization technique well established for image retrieval, while satisfying the desired isotropic property of PMI based models. Experiments with pre-trained embeddings (i.e., GloVe and HDC) demonstrate a more than thirty-fold compression ratio with comparable and sometimes even improved performance over the original real-valued embedding vectors.

show abstract

Learning Word Representations by Jointly Modeling Syntagmatic and Paradigmatic Relations

Cited by 42 publications

References 19 publications

The Word Analogy Testing Caveat

The Word Analogy Testing Caveat

Commonsense mining as knowledge base completion? A study on the impact of novelty

Embedding Compression with Isotropic Iterative Quantization

Contact Info

Product

Resources

About