2021
DOI: 10.48550/arxiv.2103.07259
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Explaining and Improving BERT Performance on Lexical Semantic Change Detection

Abstract: Type-and token-based embedding architectures are still competing in lexical semantic change detection. The recent success of typebased models in SemEval-2020 Task 1 has raised the question why the success of tokenbased models on a variety of other NLP tasks does not translate to our field. We investigate the influence of a range of variables on clusterings of BERT vectors and show that its low performance is largely due to orthographic information on the target word, which is encoded even in the higher layers … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 17 publications
0
3
0
Order By: Relevance
“…Due to their contextual resolution, transformer models such as BERT are able to attend to more features of the text, including orthographic information (Laicher et al 2021) and, as we show in this paper, syntactic information. Furthermore, such models also encode positional information in the focal word embeddings and in those of the contextual sequence.…”
Section: A1 Challenges In the Application Of Deep Language Modelsmentioning
confidence: 90%
See 2 more Smart Citations
“…Due to their contextual resolution, transformer models such as BERT are able to attend to more features of the text, including orthographic information (Laicher et al 2021) and, as we show in this paper, syntactic information. Furthermore, such models also encode positional information in the focal word embeddings and in those of the contextual sequence.…”
Section: A1 Challenges In the Application Of Deep Language Modelsmentioning
confidence: 90%
“…Since our training and inference task are identical, we do not risk a gap between training and analysis (Rogers, Kovaleva, and Rumshisky 2020) and the accuracy of our approach directly depends on the model's native training performance. On the other hand, any technique to increase performance of lexical substitutions (e.g., Amrami and Goldberg 2019;Laicher et al 2021;Schick and Schütze 2019) apply to our model as well.…”
Section: A1 Challenges In the Application Of Deep Language Modelsmentioning
confidence: 99%
See 1 more Smart Citation