“…Balog et al () to compare a VSM representation with respect to using probabilistic Latent Semantic Indexing (pLSI), a topic model representation, showing that the first option reaches significantly better results. Gruetze, Kasneci, Zuo, and Naumann () also compare a VSM representation with respect to a probabilistic model obtaining the same conclusion. Artiles, Amigó, and Gonzalo () study the impact of the NEs in this problem and conclude that these features do not provide a substantial competitive advantage when they are compared with a combination of simple features that do not require linguistic preprocessing (local and global tokens, snippets, ‐grams, and so on).…”