flu season (2). The report also found strong evidence of autocorrelation and seasonality in the GFT errors, and presented evidence that the issues were likely, at least in part, due to modifications made by Google's search algorithm and the decision by GFT engineers not to use previous CDC reports or seasonality estimates in their models -what the article labeled "algorithm dynamics" and "big data hubris" respectively. Moreover, the report and the supporting online materials detailed how difficult/impossible it is to replicate the GFT results, undermining independent efforts to explore the source of GFT errors and formulate improvements.To address the accuracy problems from the 2012-2013 flu season, GFT engineers announced modifications to the algorithm at the annual conference of the International Society of Neglected Tropical Diseases (3). These modifications relied on the assumption that increased media coverage of the flu during the 2012-2013 season was the cause of the error -an assumption shared by most all of the media coverage of the problem (1,4,5). Two changes were made: (1) dampening anomalous media spikes and (2) using ElasticNet, rather than regression, for estimation.GFT still stands as a triumph of big data engineering. This is why it is so critical that continued monitoring and re-evaluation of the results take place, not just within the GFT team but also within the larger academic community.So have these changes corrected the problem? While it is impossible to say for sure based on one subsequent season, the evidence so far does not look promising. First, the problems identified with replication in GFT appear to, if anything, have gotten worse. Second, the evidence that the problems in 2012-2013 were due to media coverage is tenuous. While GFT engineers have shown that there was a spike in coverage during the 2012-2013 season, it seems unlikely that this spike was larger than during the 2005-2006 A/H5N1 ("bird flu") outbreak and the 2009 A/H1N1 ("swine flu") pandemic. Moreover, it does not explain why the proportional errors were so large in the 2011-2012 season. Finally, while the changes made have dampened the propensity for overestimation by GFT, they have not eliminated the autocorrelation and seasonality problems in the data.
Making the Black Box Still Darker?One of our main concerns about GFT is the degree to which the estimates are a product of a highly nontransparent process. Science is a cumulative enterprise, and progress requires the ability for the community to continually assess the work on which they are building (6, 7). GFT has not been very forthcoming with this information in the past, going so far as to release misleading example search terms in previous publications (2,3,8).These transparency problems have, if anything, become worse. While the data on the intensity of media coverage of flu outbreaks does not involve privacy concerns, GFT has not released this data nor have they provided an explanation of how the information was collected and utilized. This information is criti...