Supervised topic models with word order structure for document classification and retrieval learning

Jameel, Shoaib; Lam, Wai

doi:10.1007/s10791-015-9254-2

Cited by 17 publications

(8 citation statements)

References 100 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We have adopted the same preprocessing strategy as for the categorization task, with the exception of OHSUMED, for which suitable LTR features are already given. For all other datasets we used the Terrier LTR framework 23 to generate the six standard LTR document features as described in (Jameel et al, 2015). The document vectors were then concatenated with these six features 24 .…”

Section: Document Embedding Resultsmentioning

confidence: 99%

Word and Document Embedding with vMF-Mixture Priors on Context Word Vectors

Jameel¹,

Schockaert²

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

Word embedding models typically learn two types of vectors: target word vectors and context word vectors. These vectors are normally learned such that they are predictive of some word co-occurrence statistic, but they are otherwise unconstrained. However, the words from a given language can be organized in various natural groupings, such as syntactic word classes (e.g. nouns, adjectives, verbs) and semantic themes (e.g. sports, politics, sentiment). Our hypothesis in this paper is that embedding models can be improved by explicitly imposing a cluster structure on the set of context word vectors. To this end, our model relies on the assumption that context word vectors are drawn from a mixture of von Mises-Fisher (vMF) distributions, where the parameters of this mixture distribution are jointly optimized with the word vectors. We show that this results in word vectors which are qualitatively different from those obtained with existing word embedding models. We furthermore show that our embedding model can also be used to learn high-quality document representations.

show abstract

Section: Document Embedding Resultsmentioning

confidence: 99%

Word and Document Embedding with vMF-Mixture Priors on Context Word Vectors

Jameel¹,

Schockaert²

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

show abstract

“…Based on the frequency of occurrence of the index term, this scheme attempts to find the most likely category into which new documents are supposed to be categorized. [6][7] As for the representative methods in the statistic document classification technique, there are two main methods: (1) the method to use Bayesian probability and (2) the method to use vector similarity. In the Bayesian probability based method, probabilities that documents are classified to each category are estimated whenever the index terms extracted from an arbitrary document appear.…”

Section: Related Studiesmentioning

confidence: 99%

Document Classification System According to the Degree of Semantic Link Expressed Fuzzy Function

Eun¹

2016

IJSEIA

View full text Add to dashboard Cite

At present, information retrieval systems are simply expressed with a combination of keyword search according to the direct keyword matching method to get the information that users need. Because of this, documents retrieval systems serve too many documents due to term ambiguity. This makes the user need extra time and effort to get closer the document. To overcome these problems, this paper proposes the information retrieval system based on the content that connects documents according to the degree of semantic link that expresses a fuzzy value by fuzzy function. This paper also proposes an algorithm that produces a hierarchical structure using the degree of concept and content among documents. As a result, we are able to select and to provide user-interested documents.

show abstract

“…We learn the parameters of the model using the training data (75%), and report the perplexity results on the held-out data (25%). For the parametric topic models, we use a tuning set to determine the number of topics following the tuning procedure described in [13]. Our objective is to compare how well our model has learned all parameters and how it performs in terms of its generalization ability.…”

Section: Lifestyle Pattern Quality Evaluationmentioning

confidence: 99%

Exploring Urban Lifestyles Using a Nonparametric Temporal Graphical Model

Jameel

Liao

Lam

et al. 2016

Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval

Self Cite

View full text Add to dashboard Cite

We propose a new unsupervised nonparametric temporal topic model to discover lifestyle patterns from location-based social networks. By relating the textual content, time stamps, and venue categories associated to user check-ins, our framework detects the predominant lifestyle patterns in a given geographic region. The temporal component of our model allows us to analyse the evolution of lifestyle patterns throughout the year. We provide examples of interesting patterns that have been discovered by our model, and we show that our model compares favourably to existing approaches in terms of lifestyle pattern quality and computation time. We also quantitatively show that our model outperforms existing methods in a time stamp prediction task.

show abstract

Supervised topic models with word order structure for document classification and retrieval learning

Cited by 17 publications

References 100 publications

Word and Document Embedding with vMF-Mixture Priors on Context Word Vectors

Word and Document Embedding with vMF-Mixture Priors on Context Word Vectors

Document Classification System According to the Degree of Semantic Link Expressed Fuzzy Function

Exploring Urban Lifestyles Using a Nonparametric Temporal Graphical Model

Contact Info

Product

Resources

About