“…Various subsequent approaches use variants of term occurrence measures with probabilities, such as χ2-test, log likelihood (Dunning, 1993) and mutual information (Church and Hanks, 1990), or attempt to combine statistical measures with various types of linguistic and stop-word filters, so as to refine the keyword results. Considerations regarding term ambiguity and variation also led to rule-based approaches (Jacquemin, 2001) and resource-based approaches exploiting existing thesauri and lexica, such as UMLS (Hliaoutakis et al, 2009), or Word-Net (Aggarwal et al, 2018). Knowledge poor statistical approaches, such as Latent Semantic Analysis (Deerwester et al, 1990) and Latent Dirichlet Allocation (Blei et al, 2003) attempt to detect document content in an unsupervised manner while reducing the dimensionality of the feature space of other bag-of-word approaches, but are also sensitive to sparse data and variation in short texts.…”