Neal Shah scite author profile

Background The coronavirus disease (COVID-19) pandemic is a global health emergency with over 6 million cases worldwide as of the beginning of June 2020. The pandemic is historic in scope and precedent given its emergence in an increasingly digital era. Importantly, there have been concerns about the accuracy of COVID-19 case counts due to issues such as lack of access to testing and difficulty in measuring recoveries. Objective The aims of this study were to detect and characterize user-generated conversations that could be associated with COVID-19-related symptoms, experiences with access to testing, and mentions of disease recovery using an unsupervised machine learning approach. Methods Tweets were collected from the Twitter public streaming application programming interface from March 3-20, 2020, filtered for general COVID-19-related keywords and then further filtered for terms that could be related to COVID-19 symptoms as self-reported by users. Tweets were analyzed using an unsupervised machine learning approach called the biterm topic model (BTM), where groups of tweets containing the same word-related themes were separated into topic clusters that included conversations about symptoms, testing, and recovery. Tweets in these clusters were then extracted and manually annotated for content analysis and assessed for their statistical and geographic characteristics. Results A total of 4,492,954 tweets were collected that contained terms that could be related to COVID-19 symptoms. After using BTM to identify relevant topic clusters and removing duplicate tweets, we identified a total of 3465 (<1%) tweets that included user-generated conversations about experiences that users associated with possible COVID-19 symptoms and other disease experiences. These tweets were grouped into five main categories including first- and secondhand reports of symptoms, symptom reporting concurrent with lack of testing, discussion of recovery, confirmation of negative COVID-19 diagnosis after receiving testing, and users recalling symptoms and questioning whether they might have been previously infected with COVID-19. The co-occurrence of tweets for these themes was statistically significant for users reporting symptoms with a lack of testing and with a discussion of recovery. A total of 63% (n=1112) of the geotagged tweets were located in the United States. Conclusions This study used unsupervised machine learning for the purposes of characterizing self-reporting of symptoms, experiences with testing, and mentions of recovery related to COVID-19. Many users reported symptoms they thought were related to COVID-19, but they were not able to get tested to confirm their concerns. In the absence of testing availability and confirmation, accurate case estimations for this period of the outbreak may never be known. Future studies should continue to explore the utility of infoveillance approaches to estimate COVID-19 disease severity.

show abstract

Big Data, Natural Language Processing, and Deep Learning to Detect and Characterize Illicit COVID-19 Product Sales: Infoveillance Study on Twitter and Instagram

Mackey¹,

Li²,

Purushothaman³

et al. 2020

JMIR Public Health Surveill

View full text Add to dashboard Cite

Background The coronavirus disease (COVID-19) pandemic is perhaps the greatest global health challenge of the last century. Accompanying this pandemic is a parallel “infodemic,” including the online marketing and sale of unapproved, illegal, and counterfeit COVID-19 health products including testing kits, treatments, and other questionable “cures.” Enabling the proliferation of this content is the growing ubiquity of internet-based technologies, including popular social media platforms that now have billions of global users. Objective This study aims to collect, analyze, identify, and enable reporting of suspected fake, counterfeit, and unapproved COVID-19–related health care products from Twitter and Instagram. Methods This study is conducted in two phases beginning with the collection of COVID-19–related Twitter and Instagram posts using a combination of web scraping on Instagram and filtering the public streaming Twitter application programming interface for keywords associated with suspect marketing and sale of COVID-19 products. The second phase involved data analysis using natural language processing (NLP) and deep learning to identify potential sellers that were then manually annotated for characteristics of interest. We also visualized illegal selling posts on a customized data dashboard to enable public health intelligence. Results We collected a total of 6,029,323 tweets and 204,597 Instagram posts filtered for terms associated with suspect marketing and sale of COVID-19 health products from March to April for Twitter and February to May for Instagram. After applying our NLP and deep learning approaches, we identified 1271 tweets and 596 Instagram posts associated with questionable sales of COVID-19–related products. Generally, product introduction came in two waves, with the first consisting of questionable immunity-boosting treatments and a second involving suspect testing kits. We also detected a low volume of pharmaceuticals that have not been approved for COVID-19 treatment. Other major themes detected included products offered in different languages, various claims of product credibility, completely unsubstantiated products, unapproved testing modalities, and different payment and seller contact methods. Conclusions Results from this study provide initial insight into one front of the “infodemic” fight against COVID-19 by characterizing what types of health products, selling claims, and types of sellers were active on two popular social media platforms at earlier stages of the pandemic. This cybercrime challenge is likely to continue as the pandemic progresses and more people seek access to COVID-19 testing and treatment. This data intelligence can help public health agencies, regulatory authorities, legitimate manufacturers, and technology platforms better remove and prevent this content from harming the public.

show abstract

A Machine Learning Approach for the Detection and Characterization of Illicit Drug Dealers on Instagram: Model Evaluation Study

Li¹,

Xu²,

Shah³

et al. 2019

J Med Internet Res

View full text Add to dashboard Cite

Background Social media use is now ubiquitous, but the growth in social media communications has also made it a convenient digital platform for drug dealers selling controlled substances, opioids, and other illicit drugs. Previous studies and news investigations have reported the use of popular social media platforms as conduits for opioid sales. This study uses deep learning to detect illicit drug dealing on the image and video sharing platform Instagram. Objective The aim of this study was to develop and evaluate a machine learning approach to detect Instagram posts related to illegal internet drug dealing. Methods In this paper, we describe an approach to detect drug dealers by using a deep learning model on Instagram. We collected Instagram posts using a Web scraper between July 2018 and October 2018 and then compared our deep learning model against 3 different machine learning models (eg, random forest, decision tree, and support vector machine) to assess the performance and accuracy of the model. For our deep learning model, we used the long short-term memory unit in the recurrent neural network to learn the pattern of the text of drug dealing posts. We also manually annotated all posts collected to evaluate our model performance and to characterize drug selling conversations. Results From the 12,857 posts we collected, we detected 1228 drug dealer posts comprising 267 unique users. We used cross-validation to evaluate the 4 models, with our deep learning model reaching 95% on F1 score and performing better than the other 3 models. We also found that by removing the hashtags in the text, the model had better performance. Detected posts contained hashtags related to several drugs, including the controlled substance Xanax (1078/1228, 87.78%), oxycodone/OxyContin (321/1228, 26.14%), and illicit drugs lysergic acid diethylamide (213/1228, 17.34%) and 3,4-methylenedioxy-methamphetamine (94/1228, 7.65%). We also observed the use of communication applications for suspected drug trading through user comments. Conclusions Our approach using a combination of Web scraping and deep learning was able to detect illegal online drug sellers on Instagram, with high accuracy. Despite increased scrutiny by regulators and policymakers, the Instagram platform continues to host posts from drug dealers, in violation of federal law. Further action needs to be taken to ensure the safety of social media communities and help put an end to this illicit digital channel of sourcing.

show abstract

A Framework Proposal for Blockchain-Based Scientific Publishing Using Shared Governance

et al. 2019

View full text Add to dashboard Cite

Scientific research activity is reaching a staggering growth rate, introducing new and compounding existing challenges regarding the quality of peer-review, rise of predatory journals, and larger issues involving academic integrity and fraud stemming from the increased pressure to publish. Blockchain, a distributed ledger technology, is well-suited to address some of the challenges specific to scientific publishing. Companies including ARTiFACTS, Pluto, Orvium, and ScienceMatters-EUREKA, along with academic researchers, are exploring blockchain-based solutions to facilitate research data provenance and workflows, optimize the peer-review process, introduce better incentives, and even create new research journals and platforms utilizing blockchain. Building upon a review of these efforts, we propose a governance framework for scientific publishing based on a consortium blockchain model to create a more efficient means of navigating the publishing process. At the center of this framework is a model that adopts shared governance and validated inclusion via a Democratic Autonomous Organization (DAO). A DAO is an entity wherein the organizational rules are implemented and executed via smart contracts. The DAO will be comprised of participants of validated individuals and organizations who are publishers, editors, peer-reviewers, and citizen scientists to manage and oversee the framework. The framework also maps specifically to the publication workflow of submitting, handling, peer-review, and final editorial decision-making for scientific manuscripts. The goal of this framework is to increase transparency of scientific publishing, create a "pedigree" of a manuscript's research life cycle, and democratize the publication process while maintaining the accepted workflow common to scientific publishing by journals.

show abstract

Characterizing Weibo Social Media Posts From Wuhan, China During the Early Stages of the COVID-19 Pandemic: Qualitative Content Analysis

Xu¹,

Shen²,

Shah³

et al. 2020

JMIR Public Health Surveill

View full text Add to dashboard Cite

Background The COVID-19 pandemic has reached 40 million confirmed cases worldwide. Given its rapid progression, it is important to examine its origins to better understand how people’s knowledge, attitudes, and reactions have evolved over time. One method is to use data mining of social media conversations related to information exposure and self-reported user experiences. Objective This study aims to characterize the knowledge, attitudes, and behaviors of social media users located at the initial epicenter of the outbreak by analyzing data from the Sina Weibo platform in Chinese. Methods We used web scraping to collect public Weibo posts from December 31, 2019, to January 20, 2020, from users located in Wuhan City that contained COVID-19–related keywords. We then manually annotated all posts using an inductive content coding approach to identify specific information sources and key themes including news and knowledge about the outbreak, public sentiment, and public reaction to control and response measures. Results We identified 10,159 COVID-19 posts from 8703 unique Weibo users. Among our three parent classification areas, 67.22% (n=6829) included news and knowledge posts, 69.72% (n=7083) included public sentiment, and 47.87% (n=4863) included public reaction and self-reported behavior. Many of these themes were expressed concurrently in the same Weibo post. Subtopics for news and knowledge posts followed four distinct timelines and evidenced an escalation of the outbreak’s seriousness as more information became available. Public sentiment primarily focused on expressions of anxiety, though some expressions of anger and even positive sentiment were also detected. Public reaction included both protective and elevated health risk behavior. Conclusions Between the announcement of pneumonia and respiratory illness of unknown origin in late December 2019 and the discovery of human-to-human transmission on January 20, 2020, we observed a high volume of public anxiety and confusion about COVID-19, including different reactions to the news by users, negative sentiment after being exposed to information, and public reaction that translated to self-reported behavior. These findings provide early insight into changing knowledge, attitudes, and behaviors about COVID-19, and have the potential to inform future outbreak communication, response, and policy making in China and beyond.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Neal Shah

Machine Learning to Detect Self-Reporting of Symptoms, Testing Access, and Recovery Associated With COVID-19 on Twitter: Retrospective Big Data Infoveillance Study

Big Data, Natural Language Processing, and Deep Learning to Detect and Characterize Illicit COVID-19 Product Sales: Infoveillance Study on Twitter and Instagram

A Machine Learning Approach for the Detection and Characterization of Illicit Drug Dealers on Instagram: Model Evaluation Study

A Framework Proposal for Blockchain-Based Scientific Publishing Using Shared Governance

Characterizing Weibo Social Media Posts From Wuhan, China During the Early Stages of the COVID-19 Pandemic: Qualitative Content Analysis

Contact Info

Product

Resources

About