Microblogging sites such as Twitter and Weibo are increasingly being used to enhance situational awareness during various natural and man-made disaster events such as floods, earthquakes, and bomb blasts. During any such event, thousands of microblogs (tweets) are posted in short intervals of time. Typically, only a small fraction of these tweets contribute to situational awareness, while the majority merely reflect the sentiment or opinion of people. Real-time extraction of tweets that contribute to situational awareness is especially important for relief operations when time is critical. However, automatically differentiating such tweets from those that reflect opinion / sentiment is a non-trivial challenge, mainly because of the very small size of tweets and the informal way in which tweets are written (frequent use of emoticons, abbreviations, and so on). This study applies Natural Language Processing (NLP) techniques to address this challenge. We extract low-level syntactic features from the text of tweets, such as the presence of specific types of words and parts-of-speech, to develop a classifier to distinguish between tweets which contribute to situational awareness and tweets which do not. Experiments over tweets related to four diverse disaster events show that the proposed features identify situational awareness tweets with significantly higher accuracy than classifiers based on standard bag-of-words models.
We focus on three aspects of the early spread of a hashtag in order to predict whether it will go viral: the network properties of the subset of users tweeting the hashtag, its geographical properties, and, most importantly, its conductance-related properties. One of our significant contributions is to discover the critical role played by the conductance based features for the successful prediction of virality. More specifically, we show that the first derivative of the conductance gives an early indication of whether the hashtag is going to go viral or not. We present a detailed experimental evaluation of the effect of our various categories of features on the virality prediction task. When compared to the baselines and the state of the art techniques proposed in the literature our feature set is able to achieve significantly better accuracy on a large dataset of 7.7 million users and all their tweets over a period of month, as well as on existing datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.