2021
DOI: 10.2196/25314
|View full text |Cite
|
Sign up to set email alerts
|

Toward Using Twitter for Tracking COVID-19: A Natural Language Processing Pipeline and Exploratory Data Set

Abstract: Background In the United States, the rapidly evolving COVID-19 outbreak, the shortage of available testing, and the delay of test results present challenges for actively monitoring its spread based on testing alone. Objective The objective of this study was to develop, evaluate, and deploy an automatic natural language processing pipeline to collect user-generated Twitter data as a complementary resource for identifying potential cases of COVID-19 in th… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
28
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 53 publications
(32 citation statements)
references
References 12 publications
1
28
0
Order By: Relevance
“…Shimkhada et al [ 27 ] used Twitter Chat to identify barriers and responsive policy of patients with metastatic breast cancer care. Klein et al [ 28 ] conducted Twitter data to track the spread of COVID-19. Johnson et al [ 29 ] performed Google Trends to monitor sexually transmitted infections Chicago.…”
Section: Discussionmentioning
confidence: 99%
“…Shimkhada et al [ 27 ] used Twitter Chat to identify barriers and responsive policy of patients with metastatic breast cancer care. Klein et al [ 28 ] conducted Twitter data to track the spread of COVID-19. Johnson et al [ 29 ] performed Google Trends to monitor sexually transmitted infections Chicago.…”
Section: Discussionmentioning
confidence: 99%
“…Teams placing second and third achieved F 1 -scores of 0.77 and 0.76, respectively, using COVID-Twitter-BERT, while the teams (that submitted system descriptions) that achieved F 1 -scores of less than 0.76 did not use models pre-trained on tweets related to COVID-19. The leading team outperformed a benchmark classifier presented in recent work (Klein et al, 2021), which was based on COVID-Twitter-BERT and achieved an F 1 -score (0.76) similar to that of the teams placing second and third.…”
Section: Task 5: Classification Of Tweets Self-reporting Potential Cases Of Covid-19mentioning
confidence: 60%
“…Task 5 is a binary classification task that involves automatically distinguishing tweets that self-report potential cases of COVID-19 ("potential case" tweets) from those that do not ("other" tweets), where "potential case" tweets broadly include those indicating that the user or a member of the user's household was denied testing for COVID-19, showing symptoms of COVID-19, potentially exposed to cases of COVID-19, or had had experiences that pose a higher risk of exposure to COVID-19. The training set (Klein et al, 2021) contains 7181 tweets: 1148 (16%) "potential case" tweets (annotated as "1") and 6033 (84%) "other" tweets (annotated as "0"). The test set contains 1795 annotated tweets: 308 (17%) "potential case" tweets and 1487 (83%) "other" tweets.…”
Section: Task 5: Classification Of Tweets Self-reporting Potential Cases Of Covid-19mentioning
confidence: 99%
“…However, this approach can be applied to other search engines (eg, Baidu, Yahoo, Naver), as was done in previous studies on different diseases and on COVID-19 in Hubei province, China [24,25]. Previous studies have shown that self-reported symptoms on social media networks, such as Twitter, can provide useful information to track the COVID-19 pandemic and can be used for infoveillance along with search engine data [26][27][28]. Our approach could, in principle, be applied to social media as well.…”
Section: Discussionmentioning
confidence: 99%