“…This includes mainly (i), semantic similarity models assuming one sense for each word and then measuring its spatial displacement by a similarity metric (such as cosine) in a semantic vector space (Gulordava and Baroni, 2011;Xu and Kemp, 2015;Eger and Mehler, 2016;Hellrich and Hahn, 2016;Hamilton et al, 2016a,b) and (ii), word sense induction models (WSI) inferring for each word a probability distribution over different word senses (or topics) in turn modeled as a distribution over words (Wang and Mccallum, 2006;Bamman and Crane, 2011;Wijaya and Yeniterzi, 2011;Lau et al, 2012;Mihalcea and Nastase, 2012;Frermann and Lapata, 2016).…”