“…For example, Short Text Similarity (STS) evaluates the semantic similarity between two short text snippets (target domain). Since they are short (only a few sentences, e.g., a tweet), standard statistical Bayesian linear and logistic regression (Friedman, Hastie, & Tibshirani, 2001) (Sultan et al, 2016) Probabilistic matrix factorization model (PMF) (Mnih & Salakhutdinov, 2008) (Jing et al, 2014) (Iwata & Koh, 2015) Flexible mixture model (Si & Jin, 2003) (Li et al, 2009) Polylingual topic models (Mimno, Wallach, Naradowsky, Smith, & McCallum, 2009) (Hu et al, 2014) Probabilistic latent semantic analysis (PLSA) (Hofmann, 1999) (Xue et al, 2008), (Gao & Li, 2011), (Zhuang et al, 2013), (Zhuang et al, 2010), (Zhuang et al, 2012), (Li et al, 2012), (Zhai et al, 2004(Zhai et al, ), et al, 2009 Latent Dirichlet allocation (LDA) (Blei et al, 2003) (Wu & Chien, 2010), (Jin et al, 2011), (Chen et al, 2015), (Yu & Aloimonos, 2010), (Yang et al, 2011), (Tang et al, 2012), (Phan et al, 2011) Probabilistic linear discriminant analysis (PLDA) (Prince & Elder, 2007) (Hong, Zhang, Li, Wan, & Tong, 2016) (López & Lleida, 2012) Conditional random field (CRF) (Lafferty et al, 2001) (Nallapati, Surdeanu, & Manning, 2010) (Finkel & Manning, 2009) (Arnold et al, 2008) Hierarchal latent Dirichlet allocation (hLDA)…”