“…In (i), semantic vector spaces, each word is represented as two vectors reflecting its co-occurrence statistics at different periods of time (Gulordava and Baroni, 2011;Kim et al, 2014;Xu and Kemp, 2015;Eger and Mehler, 2016;Hamilton et al, 2016a,b;Hellrich and Hahn, 2016;Rosenfeld and Erk, 2018). LSC is typically measured by the cosine distance (or some alternative metric) between the two vectors, or by differences in contextual dispersion between the two vectors (Kisselew et al, 2016;Schlechtweg et al, 2017). (ii) Diachronic topic models infer a probability distribution for each word over different word senses (or topics), which are in turn modeled as a distribution over words (Wang and McCallum, 2006;Bamman and Crane, 2011;Wijaya and Yeniterzi, 2011;Lau et al, 2012;Mihalcea and Nastase, 2012;Cook et al, 2014;Frermann and Lapata, 2016).…”