Background
Previous studies have suggested associations between trends of web searches and COVID-19 traditional metrics. It remains unclear whether models incorporating trends of digital searches lead to better predictions.
Objective
The aim of this study is to investigate the relationship between Google Trends searches of symptoms associated with COVID-19 and confirmed COVID-19 cases and deaths. We aim to develop predictive models to forecast the COVID-19 epidemic based on a combination of Google Trends searches of symptoms and conventional COVID-19 metrics.
Methods
An open-access web application was developed to evaluate Google Trends and traditional COVID-19 metrics via an interactive framework based on principal component analysis (PCA) and time series modeling. The application facilitates the analysis of symptom search behavior associated with COVID-19 disease in 188 countries. In this study, we selected the data of nine countries as case studies to represent all continents. PCA was used to perform data dimensionality reduction, and three different time series models (error, trend, seasonality; autoregressive integrated moving average; and feed-forward neural network autoregression) were used to predict COVID-19 metrics in the upcoming 14 days. The models were compared in terms of prediction ability using the root mean square error (RMSE) of the first principal component (PC1). The predictive abilities of models generated with both Google Trends data and conventional COVID-19 metrics were compared with those fitted with conventional COVID-19 metrics only.
Results
The degree of correlation and the best time lag varied as a function of the selected country and topic searched; in general, the optimal time lag was within 15 days. Overall, predictions of PC1 based on both search terms and COVID-19 traditional metrics performed better than those not including Google searches (median 1.56, IQR 0.90-2.49 versus median 1.87, IQR 1.09-2.95, respectively), but the improvement in prediction varied as a function of the selected country and time frame. The best model varied as a function of country, time range, and period of time selected. Models based on a 7-day moving average led to considerably smaller RMSE values as opposed to those calculated with raw data (median 0.90, IQR 0.50-1.53 versus median 2.27, IQR 1.62-3.74, respectively).
Conclusions
The inclusion of digital online searches in statistical models may improve the nowcasting and forecasting of the COVID-19 epidemic and could be used as one of the surveillance systems of COVID-19 disease. We provide a free web application operating with nearly real-time data that anyone can use to make predictions of outbreaks, improve estimates of the dynamics of ongoing epidemics, and predict future or rebound waves.
The present study investigated the capabilities and performances of semi-continuous and fully-continuous probabilistic approaches to DNA mixtures interpretation, particularly when dealing with Low-Template DNA mixtures. Five statistical interpretation software, such as Lab Retriever and LRmix Studio - involving semi-continuous algorithms - and DNA•VIEW, EuroForMix and STRmix- employing fully-continuous formulae - were employed to calculate likelihood ratio, comparing the prosecution and the defense hypotheses relative to a series of on-purpose prepared DNA mixtures that respectively contained 2 and 3 known contributors. National Institute of Standards and Technologies (NIST) certified templates were used for samples set up, which contained different DNA amounts for each contributor. 2-person mixtures have been prepared with proportions equal to 1:1, 19:1 and 1:19 in terms of DNA concentration. Conversely, three person mixtures were constituted by proportions equal to 20:9:1, 8:1:1, 6:3:1 and 1:1:1 in terms of DNA concentration. Furthermore, 8 equally-proportioned 3-person mixtures were prepared by means of scalar dilutions starting from an overall amount of 0.500 ng, then ranging up to DNA samples with concentrations equal to 0.004 ng (i.e. Low-Template DNA). DNA mixtures were set up in triplicate and amplified with 7 DNA amplification kits (i.e. GlobalFiler PCR Amplification Kit, NGM SElect PCR Amplification Kit, MiniFiler PCR Amplification Kit, Power Plex Fusion, PowerPlex 6C Matrix System, Power Plex ESI 17 Fast and Power Plex ESX 17 Fast) in order to evaluate whether the selection of a certain kit might represent a bias factor, capable of altering the whole interpretation process. A multi-software approach helped us to highlight any trend in the likelihood ratio results provided by semi- and fully-continuous software. As a matter of fact, fully-continuous computations provided different (higher) results in terms of degrees of magnitude of the likelihood ratio values with respect to those from the semi-continuous approach, regardless of the amplification kit that was utilized.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.