Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research W 2021
DOI: 10.18653/v1/2021.eacl-srw.25
|View full text |Cite
|
Sign up to set email alerts
|

Explaining and Improving BERT Performance on Lexical Semantic Change Detection

Abstract: Type-and token-based embedding architectures are still competing in lexical semantic change detection. The recent success of typebased models in SemEval-2020 Task 1 has raised the question why the success of tokenbased models on a variety of other NLP tasks does not translate to our field. We investigate the influence of a range of variables on clusterings of BERT vectors and show that its low performance is largely due to orthographic information on the target word, which is encoded even in the higher layers … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 19 publications
(20 citation statements)
references
References 24 publications
(3 reference statements)
0
1
0
Order By: Relevance
“…We use two types of BERT-base models as the MLM in our experiments: a publicly available pretrained model 3 (MLM pre ) and a fine-tuned model (MLM temp ) from MLM pre . The base model consists of 12 layers, which we use in two different configurations: (a) we use the last layer (MLM pre|temp,last ), and (b) the mean-pool over the last four layers (MLM pre|temp,four ), which has shown good performance across languages following Laicher et al (2021). recommend using the mean pooling over all (12) hidden layers.…”
Section: Setupmentioning
confidence: 99%
“…We use two types of BERT-base models as the MLM in our experiments: a publicly available pretrained model 3 (MLM pre ) and a fine-tuned model (MLM temp ) from MLM pre . The base model consists of 12 layers, which we use in two different configurations: (a) we use the last layer (MLM pre|temp,last ), and (b) the mean-pool over the last four layers (MLM pre|temp,four ), which has shown good performance across languages following Laicher et al (2021). recommend using the mean pooling over all (12) hidden layers.…”
Section: Setupmentioning
confidence: 99%
“…The best performing models of SemEval-2020 shared task on unsupervised LSC detection used static word embeddings (Schlechtweg et al, 2020). However, reported findings include contextualized approaches outperforming static embeddings (Kutuzov and Giulianelli, 2020); clustering of contextual embeddings performing worse than approaches that average contextual embeddings (Laicher et al, 2021) and approaches with static embeddings (Martinc et al, 2020b); and clustering contextualized embeddings performing better than averaging over them (Martinc et al, 2020a). Moreover, performance is often different for different languages (Kutuzov and Giulianelli, 2020;Martinc et al, 2020b;Vani et al, 2020).…”
Section: Lexical Semantic Change Detectionmentioning
confidence: 99%
“…Moreover, performance is often different for different languages (Kutuzov and Giulianelli, 2020;Martinc et al, 2020b;Vani et al, 2020). Performance on Swedish data is sometimes found to be worse than, for example, English and German (Laicher et al, 2021;Martinc et al, 2020b), sometimes better (Vani et al, 2020).…”
Section: Lexical Semantic Change Detectionmentioning
confidence: 99%
“…Previous work focused on discovering the words that have undergone diachronic change under the supervised settings (Kim et al, 2014;Basile and McGillivray, 2018;Tsakalidis et al, 2019;. Recently, several works have demonstrated that contextualized word representations have dynamic representation capabilities (Pilehvar and Camacho-Collados, 2019;Chronis and Erk, 2020;Garí Soler and Apidianaki, 2021;Laicher et al, 2021;Qiu and Xu, 2022), which are adopted with unsupervised methods to represent, cluster, and differentiate words across different time periods (Giulianelli et al, 2020;Montariol et al, 2021). Our method utilizes a syntax-based method to detect the salient change words within the text sequence, making the process more interpretable (Merrill et al, 2019;Ryzhova et al;Kutuzov et al, 2021).…”
Section: Related Workmentioning
confidence: 99%