The COVID-19 pandemic in Indonesia started with 2 cases on March 2, 2020, and as of May 11, a total of 14 265 people were infected. The government through Task Force for COVID-19 Rapid Response informs the progress of COVID-19 pandemic in Indonesia, but no one has provided a picture of the risk distribution in all provinces in Indonesia. This research is intended to identify high-risk provinces based on risk factors in each province and to find COVID-19 hotspots. This is an ecological study that used aggregate data. We used a map to present the risk distribution in Indonesia, and Local Indicators of Spatial Association (LISA) to define the hotspot area of COVID-19 in Indonesia. There are 6 provinces identified as high-risk areas of COVID-19 in Indonesia, and the hotspot provinces are Banten, DKI Jakarta, West Java, East Java, and Central Java.
Around 60% of COVID-19 positive cases in Indonesia have occurred in Java Island. This study provides clustering adjacent regions (cities and regencies) in Java Island into some groups based on some socio-economic factors that are suspected to affect the COVID-19 infection rates (positive cases per 100,000 residents), which could be useful for decision making by government. The factors involved in this study are poverty percentage, Human Development Index (HDI), average of expenditure per month, and open unemployment rate. There are two steps in our data analysis: first, we determined the factors that affected the infection rate significantly by using lasso, and then we estimated region-specific effects of each significant factor by using generalized lasso. In the generalized lasso, two types of spatial structure were considered, namely, regions divided by province, and neighbourhood regions based on k-means clustering and Voronoi tessellation. The tuning parameter in both lasso and generalized lasso was selected by 5-folds cross-validation. Based on the first step, three variables were found to affect the infection rate significantly. Then in the second step, the three variables had spatially varying coefficients in the generalized lasso using regions divided by provinces. On the other hand, HDI provided spatially varying coefficient in the generalized lasso using region based on k-means clustering and Voronoi tessellation.
Spatial clustering with spatially varying coefficient models is useful for determining the region with common effects of variables in spatial data. This study focuses on selecting the optimum tuning parameter of the generalized lasso for clustering with the spatially varying coefficient model. The k-fold cross-validation (CV) may fail to split spatial data into a training set and a testing set, if a region contains only a few observations. Moreover, the k-fold CV is known to give a biased estimate of the out-of-sample prediction error. Therefore, we investigated the performance of approximate leave-one-out cross-validation (ALOCV) in comparison with k-fold CV for selecting the tuning parameter in a simulation study on 2-dimensional grid. The ALOCV yielded smaller error than k-fold CV and could detect edges with differences shrunk by generalized lasso appropriately. Then, the ALOCV for selecting the optimum tuning parameter of the generalized lasso in fitting the spatially varying coefficient model is applied to the Chicago crime data. The result of selection by ALOCV was in accordance with the conclusion suggested in the preceding literature. Clustering into regions in advance for making k-fold CV feasible may lead to a wrong result of clustering with a spatially varying coefficient model.
Panel data describes a condition in which there are many observations with each observation observed periodically over a period of time. The observation clustering context based on this data is known as Clustering of Time Series Data. Many methods are developed based on fluctuating time series data conditions. However, missing data causes problems in this analysis. Missing data is the unavailability of data value on an observation because there is no information related to it. This study attempts to provide an alternative method of clustering observations on data with time series containing missing data by utilizing correlation matrices converted into Euclid distance matrices which are subsequently applied by the hierarchical clustering method. The simulation process was done to see the goodness of alternative method with common method used in data with 0%, 10%, 20% and 40% missing data condition. The result was obtained that the accuracy of the observation bundling on the proposed alternative method is always better than the commonly used method. Furthermore, the implementation was done on the annual gini ratio data of each province in Indonesia in 2007 to 2017 which contained missing data in North Kalimantan Province. There were 2 clusters of province with different characteristics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.