“…Word embeddings successfully capture lexical semantic information about words based on cooccurrence patterns extracted from large corpora (Mikolov et al, 2013a;Pennington et al, 2014;Mikolov et al, 2018) or knowledge bases (Bordes et al, 2011), with excellent results on several tasks, including word similarity (Collobert and Weston, 2008;Turian et al, 2010;Socher et al, 2011), Semantic Textual Similarity (Shao, 2017), or more recently, unsupervised machine translation (Artetxe et al, 2019), inferring representations for rare words (Schick and Schütze, 2020), unsupervised word alignment (Jalili Sabet et al, 2020) or knowledge base probes (Dufter et al, 2021). In these tasks, word embeddings perform similarly or better than transformer-based language models such as BERT (Devlin et al, 2019), while requiring a comparatively tiny amount of resources for training and inference.…”