Background
The vaccination uptake rates of the human papillomavirus (HPV) vaccine remain low despite the fact that the effectiveness of HPV vaccines has been established for more than a decade. Vaccine hesitancy is in part due to false information about HPV vaccines on social media. Combating false HPV vaccine information is a reasonable step to addressing vaccine hesitancy.
Objective
Given the substantial harm of false HPV vaccine information, there is an urgent need to identify false social media messages before it goes viral. The goal of the study is to develop a systematic and generalizable approach to identifying false HPV vaccine information on social media.
Methods
This study used machine learning and natural language processing to develop a series of classification models and causality mining methods to identify and examine true and false HPV vaccine–related information on Twitter.
Results
We found that the convolutional neural network model outperformed all other models in identifying tweets containing false HPV vaccine–related information (F score=91.95). We also developed completely unsupervised causality mining models to identify HPV vaccine candidate effects for capturing risk perceptions of HPV vaccines. Furthermore, we found that false information contained mostly loss-framed messages focusing on the potential risk of vaccines covering a variety of topics using more diverse vocabulary, while true information contained both gain- and loss-framed messages focusing on the effectiveness of vaccines covering fewer topics using relatively limited vocabulary.
Conclusions
Our research demonstrated the feasibility and effectiveness of using predictive models to identify false HPV vaccine information and its risk perceptions on social media.
The present study evaluated the potential use of Twitter data for providing risk indices of STIs. We developed online risk indices (ORIs) based on tweets to predict new HIV, gonorrhea, and chlamydia diagnoses, across U.S. counties and across 5 years. We analyzed over one hundred million tweets from 2009 to 2013 using open-vocabulary techniques and estimated the ORIs for a particular year by entering tweets from the same year into multiple semantic models (one for each year). The ORIs were moderately to strongly associated with the actual rates (.35 < rs < .68 for 93% of models), both nationwide and when applied to single states (California, Florida, and New York). Later models were slightly better than older ones at predicting gonorrhea and chlamydia, but not at predicting HIV. The proposed technique using free social media data provides signals of community health at a high temporal and spatial resolution.
We study the problem of automatically identifying humorous text from a new kind of text data, i.e., online reviews. We propose a generative language model, based on the theory of incongruity, to model humorous text, which allows us to leverage background text sources, such as Wikipedia entry descriptions, and enables construction of multiple features for identifying humorous reviews. Evaluation of these features using supervised learning for classifying reviews into humorous and non-humorous reviews shows that the features constructed based on the proposed generative model are much more effective than the major features proposed in the existing literature, allowing us to achieve almost 86% accuracy. These humorous review predictions can also supply good indicators for identifying helpful reviews.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.