Microblog has been used as an information source to detect real-world event. Several related studies retrieved road traffic event based on textual content. Not only detect traffic incident, we found that it is necessary to recognize statuses with similar traffic incident content. Better representation of traffic information will help the handling of traffic incident by related parties. This study proposes text-based approach for identification of similar traffic incident from twitter posts. The proposed approach performs traffic incident information extraction and calculates information’s weight based on textual similarity upon traffic incident information gained. We evaluate the proposed method by using a traffic incident information retrieval system. We used Indonesian language corpus contains traffic incident tweets data. Best average f-measure 70% was achieved by retrieval system that tested using Jaccard coefficient. Therefore text matching such as Jaccard coefficient is more suitable to be implemented in very short text document such as extracted tweet document. The experiment result gives the conclusion that the proposed approach can be implemented for identification of similar traffic incident information from Twitter.
Over the past few years, people have been able to get and share information through social media easily. Some of that information can be a false issue created by a buzzer account that intends to influence people into a specific opinion. Politicians often use social media to maintain a good image in society by utilizing buzzer accounts. The main characteristic of a buzzer account is that they upload the same content repeatedly within a certain period. Before analyzing data taken from social media such as Twitter, we need a buzzer detection system to filter data from buzzer users. This research attempts to build a buzzer detection system using text processing and classification method. We use the similarity of tweets as a feature for the buzzer detection system by applying Cosine Similarity to the Term Frequency - Inverse Document Frequency (TF-IDF) feature of the tweets. In addition, we will use other features such as the number of followers, number of followings, the intensity of tweets, the ratio of retweets, and the ratio of tweets that contain links as additional features in this study. This research uses these features as inputs to the Support Vector Machine model to determine whether an account is a buzzer or not. This system has promising results by having 89% accuracy, 86.67% precision, 70.91 % recall, and 78% F1-score.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.