Topics in Tweets: A User Study of Topic Coherence Metrics for Twitter Data

Fang, Anjie; Macdonald, Craig; Ounis, Iadh; Habel, Philip

doi:10.1007/978-3-319-30671-1_36

Cited by 28 publications

(38 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, a large-scale user study in [2] evaluated several coherence metrics including the Wikipedia PMI-based and WordNet-based metrics on tweets. Fang et al [2] showed that a newly proposed coherence metric leveraging a Twitter background dataset, called the Twitter PMI-based metric (hereafter, T-PMI), has a markablely high agreement with human judgements on tweet corpora.…”

Section: Background and Related Workmentioning

confidence: 99%

“…For example, Newman et al [8] proposed a Pointwise Mutual Information(PMI)-based metric using Wikipedia as a background dataset to evaluate the coherence of a topic from news articles and books. More recently, a new coherence PMI-based metric using a Twitter background has been proposed for tweet corpora, and was found to be the closest to human judgements [2].…”

Section: Introductionmentioning

confidence: 99%

“…In this paper, we conduct large-scale experiments on two Twitter datasets to investigate the coherence of ranked topics generated by three topic modelling approaches (LDA, TLDA and PLDA). Inspired by the precision at n evaluation metric, we also explore the coherence at n scores of the generated topic models by using the state-of-the-art Twitter PMIbased coherence metric [2], which we describe in Section 3. The contributions of this paper are as follows: 1) we examine which of the three existing topic modelling approaches for Twitter data generates more coherent topics, 2) we analyse the relationship between the coherence of a topic model and the number of topics (K), and 3) we evaluate the utility of the coherence at n coherence metric for a topic model.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Examining the Coherence of the Top Ranked Tweet Topics

Fang

Macdonald

Ounis

et al. 2016

Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval

Self Cite

View full text Add to dashboard Cite

Topic modelling approaches help scholars to examine the topics discussed in a corpus. Due to the popularity of Twitter, two distinct methods have been proposed to accommodate the brevity of tweets: the tweet pooling method and Twitter LDA. Both of these methods demonstrate a higher performance in producing more interpretable topics than the standard Latent Dirichlet Allocation (LDA) when applied on tweets. However, while various metrics have been proposed to estimate the coherence of the generated topics from tweets, the coherence of the top ranked topics, those that are most likely to be examined by users, has not been investigated. In addition, the effect of the number of generated topics K on the topic coherence scores has not been studied. In this paper, we conduct large-scale experiments using three topic modelling approaches over two Twitter datasets, and apply a state-of-the-art coherence metric to study the coherence of the top ranked topics and how K affects such coherence. Inspired by ranking metrics such as precision at n, we use coherence at n to assess the coherence of a topic model. To verify our results, we conduct a pairwise user study to obtain human preferences over topics. Our findings are threefold: we find evidence that Twitter LDA outperforms both LDA and the tweet pooling method because the top ranked topics it generates have more coherence; we demonstrate that a larger number of topics (K) helps to generate topics with more coherence; and finally, we show that coherence at n is more effective when evaluating the coherence of a topic model than the average coherence score.

show abstract

Section: Background and Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Examining the Coherence of the Top Ranked Tweet Topics

Fang

Macdonald

Ounis

et al. 2016

Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval

Self Cite

View full text Add to dashboard Cite

show abstract

“…A higher score indicates that the topic is easier to understand. Following [22,23], we use a word embedding (WE) representations-based coherence metric to evaluate the coherence of the generated topics, which has been reported to have a high agreement with human judgments. In order to capture the semantic similarity of the latest hashtags and Twitter handle names, we train our WE model using 200 million English tweets posted from 08/2015 to 08/2016.…”

Section: Methodsmentioning

confidence: 99%

Exploring Time-Sensitive Variational Bayesian Inference LDA for Social Media Data

Fang

Macdonald

Ounis

et al. 2017

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Abstract. There is considerable interest among both researchers and the mass public in understanding the topics of discussion on social media as they occur over time. Scholars have thoroughly analysed samplingbased topic modelling approaches for various text corpora including social media; however, another LDA topic modelling implementationVariational Bayesian (VB)-has not been well studied, despite its known efficiency and its adaptability to the volume and dynamics of social media data. In this paper, we examine the performance of the VB-based topic modelling approach for producing coherent topics, and further, we extend the VB approach by proposing a novel time-sensitive Variational Bayesian implementation, denoted as TVB. Our newly proposed TVB approach incorporates time so as to increase the quality of the generated topics. Using a Twitter dataset covering 8 events, our empirical results show that the coherence of the topics in our TVB model is improved by the integration of time. In particular, through a user study, we find that our TVB approach generates less mixed topics than state-of-the-art topic modelling approaches. Moreover, our proposed TVB approach can more accurately estimate topical trends, making it particularly suitable to assist end-users in tracking emerging topics on social media.

show abstract

“…a Twi er user) into a community. However, while topic modelling approaches and classi cation techniques have been widely used, challenges still exist, such as 1) existing topic modelling approaches can generate topics lacking of coherence for social media data [4,10]; 2) it is not easy to evaluate the coherence of topics [2,3]; 3) it can be challenging to generate a large training dataset for developing a social media user classi er. Hence, we identify four tasks to solve these problems and assist social scientists.…”

Section: Introductionmentioning

confidence: 99%