Capturing Evolution in Word Usage: Just Add More Clusters?

Martinc, Matej; Montariol, Syrielle; Zosa, Elaine; Pivovarova, Lidia

doi:10.1145/3366424.3382186

Cited by 33 publications

(55 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Jensen-Shannon divergence (JSD). In this measure, influenced by Dubossarsky et al (2015), Martinc et al (2020) and , word usage matrices from two time periods are first stacked into one matrix. Then, we standardize the vectors and obtain word usage clusters of token embeddings using the Affinity Propagation clustering algorithm (Frey and Dueck, 2007).…”

Section: Contextualized Embeddingsmentioning

confidence: 99%

RuSemShift: a dataset of historical lexical semantic change in Russian

Rodina¹,

Kutuzov²

2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

We present RuSemShift, a large-scale manually annotated test set for the task of semantic change modeling in Russian for two long-term time period pairs: from the pre-Soviet through the Soviet times and from the Soviet through the post-Soviet times. Target words were annotated by multiple crowd-source workers. The annotation process was organized following the DURel framework and was based on sentence contexts extracted from the Russian National Corpus. Additionally, we report the performance of several distributional approaches on RuSemShift, achieving promising results, which at the same time leave room for other researchers to improve.

show abstract

Section: Contextualized Embeddingsmentioning

confidence: 99%

RuSemShift: a dataset of historical lexical semantic change in Russian

Rodina¹,

Kutuzov²

2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…There also exist clustering methods that select the optimal K automatically, e.g. DBSCAN or Affinity Propagation (Martinc et al, 2020). They nevertheless require method-specific parameter choices which indirectly determine the number of clusters.…”

Section: Usage Typesmentioning

confidence: 99%

Analysing Lexical Semantic Change with Contextualised Word Representations

Giulianelli¹,

Tredici²,

Fernández³

2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

110

100

View full text Add to dashboard Cite

This paper presents the first unsupervised approach to lexical semantic change that makes use of contextualised word representations. We propose a novel method that exploits the BERT neural language model to obtain representations of word usages, clusters these representations into usage types, and measures change along time with three proposed metrics. We create a new evaluation dataset and show that the model representations and the detected semantic shifts are positively correlated with human judgements. Our extensive qualitative analysis demonstrates that our method captures a variety of synchronic and diachronic linguistic phenomena. We expect our work to inspire further research in this direction.

show abstract

“…This provides new opportunities for diachronic analysis: for example, it is possible to group similar token representations and measure a diversity of such representations, while predefined number of senses is not strictly necessary. Thus, currently there is an increased interest in the topic of language change detection using contextualized word embeddings [9,10,14,21,27,28].…”

Section: Contextualized Word Embeddingsmentioning

confidence: 99%

“…[27] used averaged time-specific BERT representations and calculated cosine distance between averaged vectors of two time periods as a measure of semantic change. [28] tested Affinity Propagation algorithm for usage clusterization and showed that it is consistently better than k-Means. Finally, [21] applied approaches similar to [10], but also analyzing ELMo models and adding cosine similarity of average vectors as a measure.…”

Section: Contextualized Word Embeddingsmentioning

confidence: 99%

ELMo and BERT in Semantic Change Detection for Russian

Rodina

Trofimova

Kutuzov

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

We study the effectiveness of contextualized embeddings for the task of diachronic semantic change detection for Russian language data. Evaluation test sets consist of Russian nouns and adjectives annotated based on their occurrences in texts created in pre-Soviet, Soviet and post-Soviet time periods. ELMo and BERT architectures are compared on the task of ranking Russian words according to the degree of their semantic change over time. We use several methods for aggregation of contextualized embeddings from these architectures and evaluate their performance. Finally, we compare unsupervised and supervised techniques in this task.

show abstract

Capturing Evolution in Word Usage: Just Add More Clusters?

Cited by 33 publications

References 19 publications

RuSemShift: a dataset of historical lexical semantic change in Russian

RuSemShift: a dataset of historical lexical semantic change in Russian

Analysing Lexical Semantic Change with Contextualised Word Representations

ELMo and BERT in Semantic Change Detection for Russian

Contact Info

Product

Resources

About