Learning linear transformations between counting-based and prediction-based word embeddings

Bollegala, Danushka; Hayashi, Kohei; Kawarabayashi, Ken-ichi

doi:10.1371/journal.pone.0184544

Cited by 7 publications

(8 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A couple of very recent papers propose methods to align embeddings after their construction, but focus on affine transformations, as opposed to the more restrictive but distance preserving rotations of our method. Bollegala et al [4] uses gradient descent, for parameter γ, to directly optimize…”

Section: Related Approachesmentioning

confidence: 99%

“…GloVe: [4] The GloVe model is a log-bilinear model based on ratios of word-word co-occurrence frequencies. The training objective is for the dot product of the vectors learned for words to equal the logarithm of their cooccurrence frequency.…”

Section: Different Word Embeddingsmentioning

confidence: 99%

“…Given some textual data, there are several ways of extracting a distributed vector representation or an embedding for the words in the data. Our default approach will be GloVe [4], which is prediction based. We also consider word2vec [23,21], the other most common prediction-based approach, using the Gensim [29] implementation.…”

Section: Different Word Embeddingsmentioning

confidence: 99%

See 2 more Smart Citations

Closed Form Word Embedding Alignment

Dev

Hassan

Phillips

2019

2019 IEEE International Conference on Data Mining (ICDM)

View full text Add to dashboard Cite

We develop a family of techniques to align word embeddings which are derived from different source datasets or created using different mechanisms (e.g., GloVe or word2vec). Our methods are simple and have a closed form to optimally rotate, translate, and scale to minimize root mean squared errors or maximize the average cosine similarity between two embeddings of the same vocabulary into the same dimensional space. Our methods extend approaches known as Absolute Orientation, which are popular for aligning objects in three-dimensions, and generalize an approach by Smith etal (ICLR 2017). We prove new results for optimal scaling and for maximizing cosine similarity. Then we demonstrate how to evaluate the similarity of embeddings from different sources or mechanisms, and that certain properties like synonyms and analogies are preserved across the embeddings and can be enhanced by simply aligning and averaging ensembles of embeddings.

show abstract

Section: Related Approachesmentioning

confidence: 99%

Section: Different Word Embeddingsmentioning

confidence: 99%

Section: Different Word Embeddingsmentioning

confidence: 99%

See 1 more Smart Citation

Closed Form Word Embedding Alignment

Dev

Hassan

Phillips

2019

2019 IEEE International Conference on Data Mining (ICDM)

View full text Add to dashboard Cite

show abstract

“…Linear transformations are regularly used to transfer between different embeddings or to adapt to a new domain (Bollegala et al, 2017;Arora et al, 2018b). The linear transformation can encode contextual information, an idea utilized recently by (Khodak et al, 2018) who applied a linear transformation on the DisC embedding scheme to construct a new embedding scheme (referred to as à la carte embedding), and empirically showed that it outperforms many other popular word sequence embedding schemes.…”

Section: Introductionmentioning

confidence: 99%

Restricted Isometry Property under High Correlations

Kasiviswanathan,

Rudelson

2019

Preprint

View full text Add to dashboard Cite

Matrices satisfying the Restricted Isometry Property (RIP) play an important role in the areas of compressed sensing and statistical learning. RIP matrices with optimal parameters are mainly obtained via probabilistic arguments, as explicit constructions seem hard. In this paper, we try to bridge this gap between random and deterministic designs by introducing a new model for restricted isometry designs that incorporates a fixed matrix into the construction. Our construction starts with a fixed (deterministic) matrix X satisfying some simple stable rank condition, and we show that the matrix XR, where R is a random matrix drawn from various popular probabilistic models (including, subgaussian, sparse, low-randomness, satisfying convex concentration property), satisfies the RIP with high probability. These theorems have various applications in signal recovery, deep learning, random matrix theory, dimensionality reduction, etc. Additionally, motivated by an application for understanding the effectiveness of word vector embeddings popular in natural language processing and machine learning applications, we investigate the RIP of the matrix XR (ℓ) where R (ℓ) is formed by taking all possible (disregarding order) ℓ-way entrywise products of the columns of a random matrix R.

show abstract

“…One promising approach is the use of other information such as multimodal information (Bruni et al, 2014;Kiela et al, 2014;Kiela and Clark, 2015;Kiela et al, 2015a;Silberer et al, 2017) and language resources Kiela et al, 2015b;Rothe and Schütze, 2017;Yu and Dredze, 2014). Other refinement methods include task-specific embeddings (Bolukbasi et al, 2016;Yu et al, 2017) and the selective use of multiple embeddings (Bollegala et al, 2017;Kiela et al, 2018).…”

Section: Introductionmentioning

confidence: 99%

Refining Pretrained Word Embeddings Using Layer-wise Relevance Propagation

Utsumi

2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

In this paper, we propose a simple method for refining pretrained word embeddings using layer-wise relevance propagation. Given a target semantic representation one would like word vectors to reflect, our method first trains the mapping between the original word vectors and the target representation using a neural network. Estimated target values are then propagated backward toward word vectors, and a relevance score is computed for each dimension of word vectors. Finally, the relevance score vectors are used to refine the original word vectors so that they are projected into the subspace that reflects the information relevant to the target representation. The evaluation experiment using binary classification of word pairs demonstrates that the refined vectors by our method achieve the higher performance than the original vectors.

show abstract

Learning linear transformations between counting-based and prediction-based word embeddings

Cited by 7 publications

References 20 publications

Closed Form Word Embedding Alignment

Closed Form Word Embedding Alignment

Restricted Isometry Property under High Correlations

Refining Pretrained Word Embeddings Using Layer-wise Relevance Propagation

Contact Info

Product

Resources

About