“…A number of unsupervised approaches based on contextual embeddings are proposed to sidestep the need of lexicographic resources (Schlechtweg et al, 2020;Tahmasebi et al, 2021). In general, these kinds of approaches follow a three-step scheme: i) extraction of embeddings for each occurrence of a target word from a contextual model such as BERT (Hu et al, 2019;Martinc et al, 2020a), ELMo (Kutuzov and Giulianelli, 2020;Rodina et al, 2020), or XLM-R (Cuba Gyllensten et al, 2020;Rother et al, 2020); ii) aggregation of the embeddings with a clustering algorithm like K-Means (Giulianelli et al, 2020;Cuba Gyllensten et al, 2020), Affinity Propagation (Martinc et al, 2020a;Kutuzov and Giulianelli, 2020), or DBSCAN (Rother et al, 2020;Karnysheva and Schwarz, 2020); iii) comparison of the vector distribution over clusters according to time by using a semantic distance measure, like Jensen-Shannon divergence (Martinc et al, 2020a), Entropy Difference (Giulianelli et al, 2020), or Wasserstein Distance (Montariol et al, 2021). The main limitation of applying clustering to word embeddings is the scalability issues about memory consumption and time.…”