Explaining and Improving BERT Performance on Lexical Semantic Change Detection

Laicher, Severin; Kurtyigit, Sinan; Schlechtweg, Dominik; Kuhn, Jonas; Walde, Sabine Schulte im

doi:10.18653/v1/2021.eacl-srw.25

Cited by 19 publications

(20 citation statements)

References 24 publications

(3 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We use two types of BERT-base models as the MLM in our experiments: a publicly available pretrained model 3 (MLM pre ) and a fine-tuned model (MLM temp ) from MLM pre . The base model consists of 12 layers, which we use in two different configurations: (a) we use the last layer (MLM pre|temp,last ), and (b) the mean-pool over the last four layers (MLM pre|temp,four ), which has shown good performance across languages following Laicher et al (2021). recommend using the mean pooling over all (12) hidden layers.…”

Section: Setupmentioning

confidence: 99%

Unsupervised Semantic Variation Prediction using the Distribution of Sibling Embeddings

Aida¹,

Bollegala²

2023

Findings of the Association for Computational Linguistics: ACL 2023

View full text Add to dashboard Cite

Detecting temporal semantic changes of words is an important task for various NLP applications that must make time-sensitive predictions. Lexical semantic change detection (SCD) task involves predicting whether a given target word, w, changes its meaning between two different text corpora, C 1 and C 2 . For this purpose, we propose a supervised two-staged SCD method that uses existing Word-in-Context (WiC) datasets. In the first stage, for a target word w, we learn two sense-aware encoders that represent the meaning of w in a given sentence selected from a corpus. Next, in the second stage, we learn a sense-aware distance metric that compares the semantic representations of a target word across all of its occurrences in C 1 and C 2 . Experimental results on multiple benchmark datasets for SCD show that our proposed method consistently outperforms all previously proposed SCD methods for multiple languages, establishing a novel state-ofthe-art (SoTA) for SCD. Interestingly, our findings imply that there are specialised dimensions that carry information related to semantic changes of words in the sense-aware embedding space. 1

show abstract

Section: Setupmentioning

confidence: 99%

Unsupervised Semantic Variation Prediction using the Distribution of Sibling Embeddings

Aida¹,

Bollegala²

2023

Findings of the Association for Computational Linguistics: ACL 2023

View full text Add to dashboard Cite

show abstract

“…The best performing models of SemEval-2020 shared task on unsupervised LSC detection used static word embeddings (Schlechtweg et al, 2020). However, reported findings include contextualized approaches outperforming static embeddings (Kutuzov and Giulianelli, 2020); clustering of contextual embeddings performing worse than approaches that average contextual embeddings (Laicher et al, 2021) and approaches with static embeddings (Martinc et al, 2020b); and clustering contextualized embeddings performing better than averaging over them (Martinc et al, 2020a). Moreover, performance is often different for different languages (Kutuzov and Giulianelli, 2020;Martinc et al, 2020b;Vani et al, 2020).…”

Section: Lexical Semantic Change Detectionmentioning

confidence: 99%

“…Moreover, performance is often different for different languages (Kutuzov and Giulianelli, 2020;Martinc et al, 2020b;Vani et al, 2020). Performance on Swedish data is sometimes found to be worse than, for example, English and German (Laicher et al, 2021;Martinc et al, 2020b), sometimes better (Vani et al, 2020).…”

Section: Lexical Semantic Change Detectionmentioning

confidence: 99%

Political dogwhistles and community divergence in semantic change

Boholm,

Sayeed

2023

Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change

View full text Add to dashboard Cite

We test whether the development of political dogwhistles can be observed using language change measures; specifically, does the development of a "hidden" message in a dogwhistle show up as differences in semantic change between communities over time? We take Swedish-language dogwhistles related to the on-going immigration debate and measure differences over time in their rate of semantic change between two Swedish-language community forums, Flashback and Familjeliv, the former representing an in-group for understanding the "hidden" meaning of the dogwhistles. We find that multiple measures are sensitive enough to detect differences over time, in that the meaning changes in Flashback over the relevant time period but not in Familjeliv. We also examine the sensitivity of multiple modeling approaches to semantic change in the matter of community divergence.

show abstract

“…Previous work focused on discovering the words that have undergone diachronic change under the supervised settings (Kim et al, 2014;Basile and McGillivray, 2018;Tsakalidis et al, 2019;. Recently, several works have demonstrated that contextualized word representations have dynamic representation capabilities (Pilehvar and Camacho-Collados, 2019;Chronis and Erk, 2020;Garí Soler and Apidianaki, 2021;Laicher et al, 2021;Qiu and Xu, 2022), which are adopted with unsupervised methods to represent, cluster, and differentiate words across different time periods (Giulianelli et al, 2020;Montariol et al, 2021). Our method utilizes a syntax-based method to detect the salient change words within the text sequence, making the process more interpretable (Merrill et al, 2019;Ryzhova et al;Kutuzov et al, 2021).…”

Section: Related Workmentioning

confidence: 99%

Efficient Continue Training of Temporal Language Model with Structural Information

Su,

Li,

Zhang

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Current language models are mainly trained on snap-shots of data gathered at a particular time, which decreases their capability to generalize over time and model language change. To model the time variable, existing works have explored temporal language models (e.g., Tem-poBERT) by directly incorporating the timestamp into the training process. While effective to some extent, these methods are limited by the superficial temporal information brought by timestamps, which fails to learn the inherent changes of linguistic components. In this paper, we empirically confirm that the performance of pre-trained language models (PLMs) is closely affiliated with syntactically changed tokens. Based on this observation, we propose a simple yet effective method named Syntax-Guided Temporal Language Model (SG-TLM), which could learn the inherent language changes by capturing an intrinsic relationship between the time prefix and the tokens with salient syntactic change. Experiments on two datasets and three tasks demonstrate that our model outperforms existing temporal language models in both memorization and generalization capabilities. Extensive results further confirm the effectiveness of our approach across different model frameworks, including both encoder-only and decoder-only models (e.g., LLaMA). Our code is available at https:// github.com/zhaochen0110/TempoLM.

show abstract

Explaining and Improving BERT Performance on Lexical Semantic Change Detection

Cited by 19 publications

References 24 publications

Unsupervised Semantic Variation Prediction using the Distribution of Sibling Embeddings

Unsupervised Semantic Variation Prediction using the Distribution of Sibling Embeddings

Political dogwhistles and community divergence in semantic change

Efficient Continue Training of Temporal Language Model with Structural Information

Contact Info

Product

Resources

About