Improving tweet stream classification by detecting changes in word probability

Nishida, K.; Hoshide, Takahide; Fujimura, Kikuo

doi:10.1145/2348283.2348412

Cited by 29 publications

(22 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Besides traditional text classification techniques, some recent works have focused on short text classification [24], [25]. [25] has extracted eight features for 5-class classification (i.e., news, events, opinions, deals, and private messages).…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

We can learn your #hashtags: Connecting tweets to explicit topics

Feng

Wang

2014

2014 IEEE 30th International Conference on Data Engineering

View full text Add to dashboard Cite

In Twitter, users can annotate tweets with hashtags to indicate the ongoing topics. Hashtags provide users a convenient way to categorize tweets. From the system's perspective, hashtags play an important role in tweet retrieval, event detection, topic tracking, and advertising, etc. Annotating tweets with the right hashtags can lead to a better user experience. However, two problems remain unsolved during an annotation:(1) Before the user decides to create a new hashtag, is there any way to help her/him find out whether some related hashtags have already been created and widely used? (2) Different users may have different preferences for categorizing tweets. However, few work has been done to study the personalization issue in hashtag recommendation. To address the above problems, we propose a statistical model for personalized hashtag recommendation in this paper. With millions of pairs being published everyday, we are able to learn the complex mappings from tweets to hashtags with the wisdom of the crowd. Two questions are answered in the model: (1) Different from traditional item recommendation data, users and tweets in Twitter have rich auxiliary information like URLs, mentions, locations, social relations, etc. How can we incorporate these features for hashtag recommendation? (2) Different hashtags have different temporal characteristics. Hashtags related to breaking events in the physical world have strong rise-and-fall temporal pattern while some other hashtags remain stable in the system. How can we incorporate hashtag related features to serve for hashtag recommendation? With all the above factors considered, we show that our model successfully outperforms existing methods on real datasets crawled from Twitter.

show abstract

Section: Methodsmentioning

confidence: 99%

“…Limited by its small parameter space, this method cannot handle classification problem with millions of hashtags as class labels. [24] has further considered changes in word probability to classify Tweet stream, which is only based on texts. It does not consider Twitter-specific features nor user's preferences.…”

Section: Methodsmentioning

confidence: 99%

We can learn your #hashtags: Connecting tweets to explicit topics

Feng

Wang

2014

2014 IEEE 30th International Conference on Data Engineering

View full text Add to dashboard Cite

show abstract

“…In the same year Alvanaki et al [1] proposed a system "enBlogue", which analyzes statistics about tags and tag pairs for identifying unusual shifts in correlations. Further recent work proposed by Nishida et al [15] shows a classification model of tweet streams for identifying changes in statistical properties on word basis, which is used for topic classification. Also in the same year Zimmermann et al [23] propose a text stream clustering method that detects, tracks and updates large and small bursts of news in a two-level topic hierarchy.…”

Section: Related Workmentioning

confidence: 99%

Event identification for local areas using social media streaming data

Weiler

Scholl

Wanner

et al. 2013

Proceedings of the ACM SIGMOD Workshop on Databases and Social Networks

View full text Add to dashboard Cite

Unprecedented success and active usage of social media services result in massive amounts of user-generated data. An increasing interest in the contained information from social media data leads to more and more sophisticated analysis and visualization applications. Because of the fast pace and distribution of news in social media data it is an appropriate source to identify events in the data and directly display their occurrence to analysts or other users. This paper presents a method for event identification in local areas using the Twitter data stream. We implement and use a combined log-likelihood ratio approach for the geographic and time dimension of real-life Twitter data in predefined areas of the world to detect events occurring in the message contents. We present a case study with two interesting scenarios to show the usefulness of our approach.

show abstract

“…For example, opinions are not the focus of our work. Nishida et al [2012] presented a wide range of tweet classification frameworks using a temporally aware Naïve Bayes classifier. Their experiments were conducted on a data set in which classes were defined based on their hashtags.…”

Section: Tweet Classificationmentioning

confidence: 99%

Classifying microblogs for disasters

Karimi

Yin

Paris

2013

Proceedings of the 18th Australasian Document Computing Symposium

View full text Add to dashboard Cite

Monitoring social media in critical disaster situations can potentially assist emergency and media personnel to deal with events as they unfold, and focus their resources where they are most needed. We address the issue of filtering massive amounts of Twitter data to identify high-value messages related to disasters, and to further classify disaster-related messages into those pertaining to particular disaster types, such as earthquake, flooding, fire, or storm. Unlike post-hoc analysis that most previous studies have done, we focus on building a classification model on past incidents to detect tweets about current incidents. Our experimental results demonstrate the feasibility of using classification methods to identify disaster-related tweets. We analyse the effect of different features in classifying tweets and show that using generic features rather than incident-specific ones leads to better generalisation on the effectiveness of classifying unseen incidents.

show abstract

Improving tweet stream classification by detecting changes in word probability

Cited by 29 publications

References 34 publications

We can learn your #hashtags: Connecting tweets to explicit topics

We can learn your #hashtags: Connecting tweets to explicit topics

Event identification for local areas using social media streaming data

Classifying microblogs for disasters

Contact Info

Product

Resources

About