Mainstream machine learning approaches to predictive analytics consistently prove their ability to perform well using a variety of datasets, although the task of identifying an optimally-performing machine learning approach for any given dataset becomes much less intuitive. Methods such as ensemble and transformation modeling have been developed to improve upon individual base learners and datasets with large degrees of variance. Despite the increased generalizability and flexibility of ensemble approaches, the cost often involves sacrificing inference for predictive ability. This paper introduces an alternative approach to ensemble modeling, combining the predictive ability of an ensemble framework with localized model construction through the incorporation of cluster analysis as a pre-processing technique. The workflow not only outperforms independent base learners and comparative ensemble methods, but also preserves local inferential capability by manipulating cluster parameters and maintaining interpretable relative importance values and non-transformed coefficients for the overall consideration of variable importance. This paper demonstrates the ensemble technique on a dataset to estimate rates of health insurance coverage across the state of Missouri, where the cluster pre-processing assists in understanding both local and global variable importance and interactions when predicting high concentration areas of low health insurance coverage based on demographic, socioeconomic, and geospatial variables.
The goal of this paper was to investigate poverty and inequities that are associated with vegetation. First, we performed a pixel-level linear regression on time-series and Normalized Difference Vegetation Index (NDVI) for 72 United States (U.S.) cities with a population ≥250,000 for 16 Radiometer 1-kilometer (1-km). Second, from the pixel-level regression, we selected five U.S. cities (Shrinking: Chicago, Detroit, Philadelphia, and Growing: Dallas and Tucson) that were one standard deviation above the overall r-squared mean and one standard deviation below the overall r-squared mean to show cities that were different from the typical cities. Finally, we used spatial statistics to investigate the relationship between census tract level data (i.e., poverty, population, and race) and vegetation for 2010, based on the 1-km grid cells using Ordinary Least Squares Regression and Geographically Weighted Regression. Our results revealed poverty related areas were significantly correlated with positive high and/or negative high vegetation in both shrinking and growing cities. This paper makes a contribution to the academic body of knowledge on U.S. urban shrinking and growing cities by using a comparative analysis with global and local spatial statistics to understand the relationship between vegetation and socioeconomic inequality. neighborhoods in crisis undergo abandonment and vacancy, all of which has the positive potential to drive an increase in vegetation due to clearance and natural ecosystem changes. This study addressed the following research question. What is the relationship between poverty and vegetation? Building on previous research on vegetation, our paper makes original and significant contributions to the corpus of literature on U.S. shrinking and growing cities [3,4]. First, we used a comparative analysis to study three shrinking and two growing cities that were one standard deviation above the overall r-squared mean and one standard deviation below the overall r-squared mean to show cities that were different from the typical city. Second, we used global and local spatial statistics to study the spatial relationship between vegetation and socioeconomic inequality. Finally, we employed a novel methodology to study this relationship by using 1-km grids, rather than U.S. census tracts. The findings from this research provide much needed insight on the difference between shrinking and growing cities.
The current study spatially examines the local variability of robbery rates in the City of Saint Louis, Missouri using both census tract and block group data disaggregated and standardized to the 250- and 500-m raster grid spatial scale. The Spatial Lag Model (SLM) indicated measures of race and stability as globally influencing robbery rates. To explore these relationships further, Geographically Weighted Regression (GWR) was used to determine the local spatial variability. We found that the standardized census tract data appeared to be more powerful in the models, while standardized block group data were more precise. Similarly, the 250-m grid offered greater accuracy, while the 500-m grid was more robust. The GWR models explained the local varying spatial relationships between race and stability and robbery rates in St. Louis better than the global models. The local models indicated that social characteristics occurring at higher-order geographies may influence robbery rates in St. Louis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.