SenseCluster at SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection

Gyllensten, Amaru Cuba; Gogoulou, Evangelia; Ekgren, Ariel; Sahlgren, Magnus

doi:10.18653/v1/2020.semeval-1.12

Cited by 6 publications

(4 citation statements)

References 19 publications

(13 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Overall, these methods can be considered typical representatives of using contextualized word embeddings for the task of semantic change detection: they boil down to directly comparing token embeddings of the target word in two periods; see (Martinc et al, 2020a) for a similar technique. Another possible approach (which we hope to analyze in the future) is clustering token embeddings into groups loosely corresponding to word senses and then comparing their time-specific distributions (Martinc et al, 2020b;Cuba Gyllensten et al, 2020;Giulianelli et al, 2020).…”

Section: Contextualized Methods For Detecting Semantic Changementioning

confidence: 99%

Contextualized embeddings for semantic change detection: Lessons learned

Kutuzov

Velldal

Øvrelid

2022

NEJLT

View full text Add to dashboard Cite

This paper presents a qualitative analysis of the outputs of contextualized embedding-based methods for detecting diachronic semantic change. We also introduce an ensemble-method outperforming previously described contextualized approaches. This method is used as a basis for an in-depth analysis of the degrees of semantic change predicted for English words across 5 decades. Our findings show that contextualized methods can sometimes predict high change scores for words which are not undergoing any real diachronic semantic shift. Such cases are discussed in detail with examples, and their linguistic categorization is proposed. Contextualized embeddings are prone to confound changes in lexicographic senses and changes in contextual variance, which naturally stems from their distributional nature. Additionally, they often merge together syntactic and semantic aspects of lexical entities. Finally, we propose possible solutions to these issues.

show abstract

Section: Contextualized Methods For Detecting Semantic Changementioning

confidence: 99%

Contextualized embeddings for semantic change detection: Lessons learned

Kutuzov

Velldal

Øvrelid

2022

NEJLT

View full text Add to dashboard Cite

show abstract

“…A number of unsupervised approaches based on contextual embeddings are proposed to sidestep the need of lexicographic resources (Schlechtweg et al, 2020;Tahmasebi et al, 2021). In general, these kinds of approaches follow a three-step scheme: i) extraction of embeddings for each occurrence of a target word from a contextual model such as BERT (Hu et al, 2019;Martinc et al, 2020a), ELMo (Kutuzov and Giulianelli, 2020;Rodina et al, 2020), or XLM-R (Cuba Gyllensten et al, 2020;Rother et al, 2020); ii) aggregation of the embeddings with a clustering algorithm like K-Means (Giulianelli et al, 2020;Cuba Gyllensten et al, 2020), Affinity Propagation (Martinc et al, 2020a;Kutuzov and Giulianelli, 2020), or DBSCAN (Rother et al, 2020;Karnysheva and Schwarz, 2020); iii) comparison of the vector distribution over clusters according to time by using a semantic distance measure, like Jensen-Shannon divergence (Martinc et al, 2020a), Entropy Difference (Giulianelli et al, 2020), or Wasserstein Distance (Montariol et al, 2021). The main limitation of applying clustering to word embeddings is the scalability issues about memory consumption and time.…”

Section: Related Workmentioning

confidence: 99%

“…The use of contextual embedding techniques is receiving more and more attention in the field of semantic shift detection. In particular, pre-trained models like BERT (Hu et al, 2019;Martinc et al, 2020a), ELMo (Kutuzov and Giulianelli, 2020;Rodina et al, 2020), and XLM-R (Cuba Gyllensten et al, 2020;Rother et al, 2020), are being proposed as promising solutions to capture the different meanings of a target word according to the different contexts in which the word appears throughout a considered diachronic corpus. Such solutions generally employ clustering techniques to aggregate embeddings of a specific word into clusters (Martinc et al, 2020a;Karnysheva and Schwarz, 2020).…”

Section: Introductionmentioning

confidence: 99%

What is Done is Done: an Incremental Approach to Semantic Shift Detection

Periti¹,

Ferrara²,

Montanelli³

et al. 2022

Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change

View full text Add to dashboard Cite

Contextual word embedding techniques for semantic shift detection are receiving more and more attention. In this paper, we present What is Done is Done (WiDiD), an incremental approach to semantic shift detection based on incremental clustering techniques and contextual embedding methods to capture the changes over the meanings of a target word along a diachronic corpus. In WiDiD, the word contexts observed in the past are consolidated as a set of clusters that constitute the "memory" of the word meanings observed so far. Such a memory is exploited as a basis for subsequent word observations, so that the meanings observed in the present are stratified over the past ones.

show abstract

“…Skurt (Gyllensten et al, 2020) The team uses pretrained cross-lingual contextualized embedding model, XLM-R (Conneau et al, 2019), which enables them to use the same model for all languages. For each target word they generated contextual representations (token representations) from the two corpora, and cluster them using K-Means++ with a fixed number of clusters (8).…”

Section: Rpi-trustmentioning

confidence: 99%

SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection

Schlechtweg¹,

McGillivray²,

Hengchen³

et al. 2020

Preprint

View full text Add to dashboard Cite

Lexical Semantic Change detection, i.e., the task of identifying words that change meaning over time, is a very active research area, with applications in NLP, lexicography, and linguistics. Evaluation is currently the most pressing problem in Lexical Semantic Change detection, as no gold standards are available to the community, which hinders progress. We present the results of the first shared task that addresses this gap by providing researchers with an evaluation framework and manually annotated, high-quality datasets for English, German, Latin, and Swedish. 33 teams submitted 186 systems, which were evaluated on two subtasks.

show abstract

SenseCluster at SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection

Cited by 6 publications

References 19 publications

Contextualized embeddings for semantic change detection: Lessons learned

Contextualized embeddings for semantic change detection: Lessons learned

What is Done is Done: an Incremental Approach to Semantic Shift Detection

SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection

Contact Info

Product

Resources

About