Assessing Google Flu Trends Performance in the United States during the 2009 Influenza Virus A (H1N1) Pandemic

Cook, S. M.; Conrad, Corrie; Fowlkes, Ashley; Mohebbi, Matthew H.

doi:10.1371/journal.pone.0023610

Cited by 333 publications

(280 citation statements)

References 6 publications

Supporting

Mentioning

259

Contrasting

Unclassified

Order By: Relevance

“…However, significant discrepancies between GFT's flu estimates and those measured by the Centers for Disease Control (CDC) in subsequent years led to considerable doubt about the value of digital disease detection systems (13). Although multiple articles have identified methodological flaws in GFT's original algorithm (14)(15)(16) and have led to incremental improvements (14,16) (see also googleresearch. blogspot.com/2014/10/google-flu-trends-gets-brand-new-engine.html), a statistical framework that is theoretically sound and capable of accurate estimation is still lacking.…”

mentioning

confidence: 99%

Accurate estimation of influenza epidemics using Google search data via ARGO

Yang

Santillana

Kou

2015

Proc. Natl. Acad. Sci. U.S.A.

353

443

View full text Add to dashboard Cite

show abstract

mentioning

confidence: 99%

Accurate estimation of influenza epidemics using Google search data via ARGO

Yang

Santillana

Kou

2015

Proc. Natl. Acad. Sci. U.S.A.

353

443

View full text Add to dashboard Cite

show abstract

“…They identified 1,152 data points that related to the flu (Ginsberg et al 2009), however, they initially did not seek new or abnormal search patterns like the A-H1N1 influenza (Cook et al 2011, Olson et al 2013. Those inconsistencies within the risk network caused Google Flu to overestimate flu prevalence, making the results no longer precise, and rendering them even less accurate than those of the CDC (Lazer et al 2014, Kugler 2016.…”

Section: Case Of Google Flumentioning

confidence: 99%

Big Data in Organizations and the Role of Human Resource Management

Scholz¹

2017

View full text Add to dashboard Cite

Section: Making the Black Box Still Darker?mentioning

confidence: 99%

“…Science is a cumulative enterprise, and progress requires the ability for the community to continually assess the work on which they are building (6, 7). GFT has not been very forthcoming with this information in the past, going so far as to release misleading example search terms in previous publications (2,3,8).These transparency problems have, if anything, become worse. While the data on the intensity of media coverage of flu outbreaks does not involve privacy concerns, GFT has not released this data nor have they provided an explanation of how the information was collected and utilized.…”

mentioning

confidence: 99%

Google Flu Trends Still Appears Sick: An Evaluation of the 2013-2014 Flu Season

et al. 2014

View full text Add to dashboard Cite

flu season (2). The report also found strong evidence of autocorrelation and seasonality in the GFT errors, and presented evidence that the issues were likely, at least in part, due to modifications made by Google's search algorithm and the decision by GFT engineers not to use previous CDC reports or seasonality estimates in their models -what the article labeled "algorithm dynamics" and "big data hubris" respectively. Moreover, the report and the supporting online materials detailed how difficult/impossible it is to replicate the GFT results, undermining independent efforts to explore the source of GFT errors and formulate improvements.To address the accuracy problems from the 2012-2013 flu season, GFT engineers announced modifications to the algorithm at the annual conference of the International Society of Neglected Tropical Diseases (3). These modifications relied on the assumption that increased media coverage of the flu during the 2012-2013 season was the cause of the error -an assumption shared by most all of the media coverage of the problem (1,4,5). Two changes were made: (1) dampening anomalous media spikes and (2) using ElasticNet, rather than regression, for estimation.GFT still stands as a triumph of big data engineering. This is why it is so critical that continued monitoring and re-evaluation of the results take place, not just within the GFT team but also within the larger academic community.So have these changes corrected the problem? While it is impossible to say for sure based on one subsequent season, the evidence so far does not look promising. First, the problems identified with replication in GFT appear to, if anything, have gotten worse. Second, the evidence that the problems in 2012-2013 were due to media coverage is tenuous. While GFT engineers have shown that there was a spike in coverage during the 2012-2013 season, it seems unlikely that this spike was larger than during the 2005-2006 A/H5N1 ("bird flu") outbreak and the 2009 A/H1N1 ("swine flu") pandemic. Moreover, it does not explain why the proportional errors were so large in the 2011-2012 season. Finally, while the changes made have dampened the propensity for overestimation by GFT, they have not eliminated the autocorrelation and seasonality problems in the data. Making the Black Box Still Darker?One of our main concerns about GFT is the degree to which the estimates are a product of a highly nontransparent process. Science is a cumulative enterprise, and progress requires the ability for the community to continually assess the work on which they are building (6, 7). GFT has not been very forthcoming with this information in the past, going so far as to release misleading example search terms in previous publications (2,3,8).These transparency problems have, if anything, become worse. While the data on the intensity of media coverage of flu outbreaks does not involve privacy concerns, GFT has not released this data nor have they provided an explanation of how the information was collected and utilized. This information is criti...

show abstract

Assessing Google Flu Trends Performance in the United States during the 2009 Influenza Virus A (H1N1) Pandemic

Cited by 333 publications

References 6 publications

Accurate estimation of influenza epidemics using Google search data via ARGO

Accurate estimation of influenza epidemics using Google search data via ARGO

Big Data in Organizations and the Role of Human Resource Management

Google Flu Trends Still Appears Sick: An Evaluation of the 2013-2014 Flu Season

Contact Info

Product

Resources

About