Proceedings of the Fourteenth Workshop on Semantic Evaluation 2020
DOI: 10.18653/v1/2020.semeval-1.22
|View full text |Cite
|
Sign up to set email alerts
|

CMCE at SemEval-2020 Task 1: Clustering on Manifolds of Contextualized Embeddings to Detect Historical Meaning Shifts

Abstract: This paper describes the system Clustering on Manifolds of Contextualized Embeddings (CMCE) submitted to the SemEval-2020 Task 1 on Unsupervised Lexical Semantic Change Detection. Subtask 1 asks to identify whether or not a word gained/lost a sense across two time periods. Subtask 2 is about computing a ranking of words according to the amount of change their senses underwent. Our system uses contextualized word embeddings from MBERT, whose dimensionality we reduce with an autoencoder and the UMAP algorithm, t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 10 publications
(12 citation statements)
references
References 15 publications
0
8
0
Order By: Relevance
“…NLPCR (Rother et al, 2020) The team uses multilingual contextualized word embeddings (Devlin et al, 2019) to represent a word's meaning. They then reduce the embedding dimensionality with either autoencoder or UMAP, and cluster the resulting representation with either GMM or HDBSCAN.…”
Section: Discussionmentioning
confidence: 99%
“…NLPCR (Rother et al, 2020) The team uses multilingual contextualized word embeddings (Devlin et al, 2019) to represent a word's meaning. They then reduce the embedding dimensionality with either autoencoder or UMAP, and cluster the resulting representation with either GMM or HDBSCAN.…”
Section: Discussionmentioning
confidence: 99%
“…Dimensionality Reduction: To the best of our knowledge, only one previous semantic change detection approach (Rother et al, 2020) has incorporated dimensionality reduction, more specifically UMAP (McInnes et al, 2018). As the Euclidean distances in the UMAP-reduced space are very sensitive to hyperparameters and it does not retain an interpretable notion of absolute distances, it might be unsuitable for pure distance-based metrics like APD, and we therefore also experiment with PCA.…”
Section: Quantifying Semantic Changementioning
confidence: 99%
“…UMAP (Uniform Manifold Approximation and Projection for Dimension Reduction) is used as in Rother et al (2020) to lower the dimensionality of the embedding space. UMAP is appropriate for this task since it preserves global structure better than other manifold learning dimensionality reduction methods such as t-SNE (McInnes et al, 2018) (McConville et al, 2021.…”
Section: Dimensionality Reductionmentioning
confidence: 99%
“…In a similar vein, Aharoni and Goldberg (2020) showed that the domain type of a particular text could be identified using the clustering of sentence-level representations. Finally, Rother et al (2020) showed that clusters of contextualised embeddings could detect meaning shifts in words. The success of these papers motivates our use of high-density clusters of sentencelevel representations.…”
Section: Introductionmentioning
confidence: 99%