The contribution of this paper is twofold. First, a new data driven approach for predicting the Covid-19 pandemic dynamics is introduced. The second contribution consists in reporting and discussing the results that were obtained with this approach for the Brazilian states, with predictions starting as of 4 May 2020. As a preliminary study, we first used an Long Short Term Memory for Data Training-SAE (LSTM-SAE) network model. Although this first approach led to somewhat disappointing results, it served as a good baseline for testing other ANN types. Subsequently, in order to identify relevant countries and regions to be used for training ANN models, we conduct a clustering of the world’s regions where the pandemic is at an advanced stage. This clustering is based on manually engineered features representing a country’s response to the early spread of the pandemic, and the different clusters obtained are used to select the relevant countries for training the models. The final models retained are Modified Auto-Encoder networks, that are trained on these clusters and learn to predict future data for Brazilian states. These predictions are used to estimate important statistics about the disease, such as peaks and number of confirmed cases. Finally, curve fitting is carried out to find the distribution that best fits the outputs of the MAE, and to refine the estimates of the peaks of the pandemic. Predicted numbers reach a total of more than one million infected Brazilians, distributed among the different states, with São Paulo leading with about 150 thousand confirmed cases predicted. The results indicate that the pandemic is still growing in Brazil, with most states peaks of infection estimated in the second half of May 2020. The estimated end of the pandemics (97% of cases reaching an outcome) spread between June and the end of August 2020, depending on the states.
Background. Epidemiological figures of Covid-19 epidemic in Italy are worse than those observed in China. Methods. We modeled the Covid-19 outbreak in Italian Regions vs. Lombardy to assess the epidemics progression and predict peaks of new daily infections and total cases by learning from the entire Chinese epidemiological dynamics. We trained an artificial neural network model, a modified autoencoder with Covid-19 Chinese data, to forecast epidemic curve of the different Italian regions, and use the susceptible-exposed-infected-removed (SEIR) compartment model to predict the spreading and peaks. We have estimated the basic reproduction number (R 0 ) -which represents the average number of people that can be infected by a person who has already acquired the infection -both by fitting the exponential growth rate of the infection across a 1-month period, and also by using a day by day assessment, based on single observations. Results. The expected peak of SEIR model for new daily cases was at the end of March at national level. The peak of overall positive cases is expected by April 11 th in Southern Italian Regions, a couple of days after that of Lombardy and Northern regions. According to our model, total confirmed cases in all Italy regions could reach 160,000 cases by April 30 th and stabilize at a plateau. Conclusions. Training neural networks on Chinese data and use the knowledge to forecast Italian spreading of Covid-19 has resulted in a good fit, measured with the mean average precision between official Italian data and the forecast.
This paper has a twofold contribution. The first is a data driven approach for predicting the Covid-19 pandemic dynamics, based on data from more advanced countries. The second is to report and discuss the results obtained with this approach for Brazilian states, as of May 4th, 2020. We start by presenting preliminary results obtained by training an LSTM-SAE network, which are somewhat disappointing. Then, our main approach consists in an initial clustering of the world regions for which data is available and where the pandemic is at an advanced stage, based on a set of manually engineered features representing a country's response to the early spread of the pandemic. A Modified Auto-Encoder network is then trained from these clusters and learns to predict future data for Brazilian states. These predictions are used to estimate important statistics about the disease, such as peaks. Finally, curve fitting is carried out on the predictions in order to find the distribution that best fits the outputs of the MAE, and to refine the estimates of the peaks of the pandemic. Results indicate that the pandemic is still growing in Brazil, with most states peaks of infection estimated between the 25th of April and the 19th of May 2020. Predicted numbers reach a total of 240 thousand infected Brazilians, distributed
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.