Journal Pre-proof J o u r n a l P r e -p r o o f 2 GIS-based Spatial Modeling of COVID-19 Incidence Rate in the Continental United States
AbstractDuring the first 90 days of the COVID-19 outbreak in the United States, over 675,000 confirmed cases of the disease have been announced, posing unprecedented socioeconomic burden to the country. Due to inadequate research on geographic modeling of COVID-19, we investigated county-level variations of disease incidence across the continental United States. We compiled a geodatabase of 35 environmental, socioeconomic, topographic, and demographic variables that could explain the spatial variability of disease incidence. Further, we employed spatial lag and spatial error models to investigate spatial dependence and geographically weighted regression (GWR) and multiscale GWR (MGWR) models to locally examine spatial non-stationarity. The results suggested that even though incorporating spatial autocorrelation could significantly improve the performance of the global ordinary least square model; these models still represent a significantly poor performance compared to the local models. Moreover, MGWR could explain the highest variations (adj. R 2 : 68.1%) with the lowest AICc compared to the others. Mapping the effects of significant explanatory variables (i.e., income inequality, median household income, the proportion of black females, and the proportion of nurse practitioners) on spatial variability of COVID-19 incidence rates using MGWR could provide useful insights to policymakers for targeted interventions.
Prediction of the COVID-19 incidence rate is a matter of global importance, particularly in the United States. As of 4 June 2020, more than 1.8 million confirmed cases and over 108 thousand deaths have been reported in this country. Few studies have examined nationwide modeling of COVID-19 incidence in the United States particularly using machine-learning algorithms. Thus, we collected and prepared a database of 57 candidate explanatory variables to examine the performance of multilayer perceptron (MLP) neural network in predicting the cumulative COVID-19 incidence rates across the continental United States. Our results indicated that a single-hidden-layer MLP could explain almost 65% of the correlation with ground truth for the holdout samples. Sensitivity analysis conducted on this model showed that the age-adjusted mortality rates of ischemic heart disease, pancreatic cancer, and leukemia, together with two socioeconomic and environmental factors (median household income and total precipitation), are among the most substantial factors for predicting COVID-19 incidence rates. Moreover, results of the logistic regression model indicated that these variables could explain the presence/absence of the hotspots of disease incidence that were identified by Getis-Ord Gi* (p < 0.05) in a geographic information system environment. The findings may provide useful insights for public health decision makers regarding the influence of potential risk factors associated with the COVID-19 incidence at the county level.
Measurements of human interaction through proxies such as social connectedness or movement patterns have proved useful for predictive modeling of COVID-19, which is a challenging task, especially at high spatial resolutions. In this study, we develop a Spatiotemporal autoregressive model to predict county-level new cases of COVID-19 in the coterminous US using spatiotemporal lags of infection rates, human interactions, human mobility, and socioeconomic composition of counties as predictive features. We capture human interactions through 1) Facebook- and 2) cell phone-derived measures of connectivity and human mobility, and use them in two separate models for predicting county-level new cases of COVID-19. We evaluate the model on 14 forecast dates between 2020/10/25 and 2021/01/24 over one- to four-week prediction horizons. Comparing our predictions with a Baseline model developed by the COVID-19 Forecast Hub indicates an average 6.46% improvement in prediction Mean Absolute Errors (MAE) over the two-week prediction horizon up to 20.22% improvement in the four-week prediction horizon, pointing to the strong predictive power of our model in the longer prediction horizons.
Highlights
Lower respiratory infections (LRI) are the cause of a significant number of hospitalizations in the US.
No previous nationwide study examined geographic variations of LRI mortality rates and their association with underlying factors.
There was a shift in the location of LRI hotspots from west coast to southeast over time.
Decision tree classifiers could predict LRI mortality hotspots with high accuracies.
Higher spring temperature and increased precipitation during winter were among the most substantial predictors of presence or absence of LRI hotspots.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.