2015
DOI: 10.1371/journal.pcbi.1004513
|View full text |Cite
|
Sign up to set email alerts
|

Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance

Abstract: We present a machine learning-based methodology capable of providing real-time (“nowcast”) and forecast estimates of influenza activity in the US by leveraging data from multiple data sources including: Google searches, Twitter microblogs, nearly real-time hospital visit records, and data from a participatory surveillance system. Our main contribution consists of combining multiple influenza-like illnesses (ILI) activity estimates, generated independently with each data source, into a single prediction of ILI … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

9
364
0
4

Year Published

2015
2015
2021
2021

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 373 publications
(377 citation statements)
references
References 33 publications
9
364
0
4
Order By: Relevance
“…ARGO can be easily generalized to any temporal and spatial scales for a variety of diseases or social events amenable to be tracked by Internet searches or services (3,4,8,9,29,30,38,39). Further improvements in influenza prediction may come from combining multiple predictors constructed from disparate data sources (40). After the initial submission of this article in May 2015, Google announced that GFT would be discontinued and that their raw data would be made accessible to selected scientific teams.…”
Section: Strength Of Argomentioning
confidence: 99%
“…ARGO can be easily generalized to any temporal and spatial scales for a variety of diseases or social events amenable to be tracked by Internet searches or services (3,4,8,9,29,30,38,39). Further improvements in influenza prediction may come from combining multiple predictors constructed from disparate data sources (40). After the initial submission of this article in May 2015, Google announced that GFT would be discontinued and that their raw data would be made accessible to selected scientific teams.…”
Section: Strength Of Argomentioning
confidence: 99%
“…Early detection of disease outbreaks is not a new issue but it remains crucial. In 2015, still many researchers [8] [9][10] tried to develop the ultimate surveillance system using search engines, social network, or wiki data. While authors combined more and more data sources for their surveillance systems, the use of health system data remained limited.…”
Section: Discussionmentioning
confidence: 99%
“…However, their prediction model provides good estimates. Santillana et al [9] evaluated with promising results the same data sources, as well as some others, against the Center for Disease Control's (CDC) gold standard for influenza-like illness surveillance. Lejeune et al [10] proposed an innovative way to detect epidemic events in news articles based on string character repetition.…”
Section: Discussionmentioning
confidence: 99%
“…The first is MEAN, the average distribution seen in the training data. The second are SVMs with linear kernels, which worked well in predicting influenza activity in a similar set-up (Santillana et al, 2015). For the baselines, we encode the distributions from the 4 previous days as a flattened 12-dim.…”
Section: Forecasting Sentiment Dynamicsmentioning
confidence: 99%
“…A body of work has also used predictive signals in Twitter to track and sense upcoming unrest and protests in specific countries (Ramakrishnan et al, 2014;Goode et al, 2015), and the future progression of flu activity based on multiple text sources (Santillana et al, 2015). In contrast, we focus on predicting the sentiment dynamics in social media based on previous trends.…”
Section: Related Workmentioning
confidence: 99%