2019 IEEE International Conference on Data Mining (ICDM) 2019
DOI: 10.1109/icdm.2019.00023
|View full text |Cite
|
Sign up to set email alerts
|

Closed Form Word Embedding Alignment

Abstract: We develop a family of techniques to align word embeddings which are derived from different source datasets or created using different mechanisms (e.g., GloVe or word2vec). Our methods are simple and have a closed form to optimally rotate, translate, and scale to minimize root mean squared errors or maximize the average cosine similarity between two embeddings of the same vocabulary into the same dimensional space. Our methods extend approaches known as Absolute Orientation, which are popular for aligning obje… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 26 publications
0
5
0
Order By: Relevance
“…Multiple fitting iterations result in sets of vectors with similar relative positions among each other, i.e. similar cosine angles between node pairs, but they generally do not retain their absolute values ( Dev et al, 2019 ; for the stability of the relations among embeddings see Wang et al, 2020 ). As a result, utilizing the CE framework to explore individual differences among subjects requires a method which would align different CEs to the same latent space ( Fig.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…Multiple fitting iterations result in sets of vectors with similar relative positions among each other, i.e. similar cosine angles between node pairs, but they generally do not retain their absolute values ( Dev et al, 2019 ; for the stability of the relations among embeddings see Wang et al, 2020 ). As a result, utilizing the CE framework to explore individual differences among subjects requires a method which would align different CEs to the same latent space ( Fig.…”
Section: Resultsmentioning
confidence: 99%
“…Independent fitting iterations of the node2vec algorithm resulted in sets of vectors with similar cosine angle between each node pairs, but not necessarily similar absolute values ( Dev et al, 2019 ). Here we demonstrate our novel approach which enables us to align separately learned CE to the same latent space (see Fig.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, if these representations can be used for LR understanding of the represented features, properties such as the relative positions of word-embeddings used to complete analogies should be evident in the respective representations despite these differences. Dev et al (2019) explain that "rotation or scaling of the entire dataset will not affect synonyms (nearest neighbors), linear substructures (dot products), analogies, or linear classifiers" because "there is nothing extrinsic about any of these properties." For example, in studying the impact of basis rotations to align GloVe (Pennington et al 2014) and Word2Vec (Mikolov et al 2013) embeddings, they confirm that using a vector from Word2Vec to complete an analogy using GloVe embeddings "is very poor, close to 0; that is, extrinsically there is very little information carried over" by the (basis dependent) parameter values themselves.…”
Section: The Target Of ML Modelsmentioning
confidence: 99%
“…6 ) between them. DSMs are first aligned using absolute orientation with scaling (see Algorithm 1 below from Dev et al, 2018 , originally Algorithm 2.4 in their paper) where the optimal alignment is obtained by minimizing the sum of squared errors under the Euclidian distance between all pairs of common data points, using linear transformations—rotation and scaling—which do not alter inner cosine similarity metrics and hence preserve measures of pairwise lexical similarity.…”
Section: Model and Experimental Setupmentioning
confidence: 99%