“…There are three types of similarity learning in NLP. The supervised paradigm differs from typical supervised learning in that training examples are cast into pairwise constraints (Yang and Jin, 2006), as in cross-lingual word embedding learning based on word-level alignments (Faruqui and Dyer, 2014) and zero-shot utterance/document classification (Yazdani and Henderson, 2015;Nam et al, 2016;Pappas and Henderson, 2019) based on utterance/document-level annotations. The unsupervised paradigm aims to learn an underlying low-dimensional space where the relationships between most of the observed data are preserved, as in word embedding learning (Collobert et al, 2011;Mikolov et al, 2013;Pennington et al, 2014;Levy and Goldberg, 2014).…”