A Cluster-Based Machine Learning Ensemble Approach for Geospatial Data: Estimation of Health Insurance Status in Missouri

Mueller, Erik D.; Sandoval, J. S. Onésimo; Mudigonda, Srikanth; Elliott, Michael

doi:10.3390/ijgi8010013

Cited by 12 publications

(15 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The algorithms developed, along with methods for determining which variable, or combination of variables, to use as the clustering variable, were demonstrated on a dataset where the objective was to determine the algorithms' ability to predict the distribution of health insurance status across the state of Missouri using demographic, socioeconomic, access, and relative-location information as independent variable loadings. Results from the research found that using a cluster-based ensemble approach outperformed all other modeling techniques evaluated in the study, including linear and nonlinear base-learning algorithms and three aggregate ensemble techniques [15]. In addition to improved predictive performance, the cluster-based ensemble also provided increased inferential power in assessing relative variable importance.…”

Section: Introductionmentioning

confidence: 94%

“…Building on methods from Trivedi et al [23] and previous research of Mueller et al [15] that examined the relationship between cluster analysis as a preprocessing technique and predictive performance associated with various base learner and ensemble machine learning models, this study sought to enhance algorithmic performance of the ensemble methodology for geospatial data through a technique known as synthetic population generation. The algorithms developed, along with methods for determining which variable, or combination of variables, to use as the clustering variable, were demonstrated on a dataset where the objective was to determine the algorithms' ability to predict the distribution of health insurance status across the state of Missouri using demographic, socioeconomic, access, and relative-location information as independent variable loadings.…”

Section: Introductionmentioning

confidence: 99%

“…Research objectives outlined for this study were direct extensions of previous research, where stability and sensitivity testing sought to validate the previously developed cluster-based ensemble technique while expanding the method by providing an improved workflow for examining global as well as localized variable importance. Mueller et al [15] demonstrated the newly developed clusterbased ensemble method on a single dataset, examining health insurance coverage with carefully chosen independent variables backed by qualitative theory relating demographic and socioeconomic information, physical location, and access to care, to the distribution of individuals in Missouri lacking health insurance. While the cluster-based ensemble outperformed other ensemble and base-learning algorithms for the health insurance dataset, the first objective in this study seeks to better determine whether the algorithm can be applied to other datasets where the underlying characteristics may vary in comparison to the dataset used in the original study.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Extending cluster-based ensemble learning through synthetic population generation for modeling disparities in health insurance coverage across Missouri

et al. 2019

Self Cite

View full text Add to dashboard Cite

In a previous study, Mueller et al. (ISPRS Int J Geo-Inf 8(1):13, 2019), presented a machine learning ensemble algorithm using K-means clustering as a preprocessing technique to increase predictive modeling performance. As a follow-on research effort, this study seeks to test the previously introduced algorithm's stability and sensitivity, as well as present an innovative method for the extraction of localized and state-level variable importance information from the original dataset, using a nontraditional method known as synthetic population generation. Through iterative synthetic population generation with similar underlying statistical properties to the original dataset and exploration of the distribution of health insurance coverage across the state of Missouri, we identified variables that contributed to decisions for clustering, variables that contributed most significantly to modeling health insurance distribution status throughout the state, and variables that were most influential in optimizing model performance, having the greatest impact on change-in-meansquared-error (MSE) measurements. Results suggest that cluster-based preprocessing approaches for machine learning algorithms can result in significantly increased performance, and also demonstrate how synthetic populations can be used for performance measurement to identify and test the extent to which variable statistical properties within a dataset can vary without resulting in significant performance loss.

show abstract

Section: Introductionmentioning

confidence: 94%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Extending cluster-based ensemble learning through synthetic population generation for modeling disparities in health insurance coverage across Missouri

et al. 2019

Self Cite

View full text Add to dashboard Cite

show abstract

“…It would be tautological to argue that such data would be utilised subsequently to determine the subsequent set of premiums (once a customer was onboard) to be paid by the customers. This could be carried out based upon machine learning (ML) or artificial intelligence (AI) based predictive analytics (Kose et al, 2015; Kuo et al, 2007; Mueller et al, 2019). Once the factor inputs data (a large set of it) was available, ML and AI could help HI managers to ascertain the most appropriate premium values.…”

Section: Futuristic Perspectivementioning

confidence: 99%

The Changing Narrative in the Health Insurance Industry: Wearables Technology in Health Insurance Products and Services for the COVID-19 World

Nayak

Bhattacharyya

2020

Journal of Health Management

View full text Add to dashboard Cite

COVID-19 pandemic, the associated economic lockdown and the norms of social distancing have disrupted the business world. Most managers have struggled to make sense of the chaos and complexity around. Health insurance industry mangers are at the forefront of this challenge as new products and services covering COVID-19 had to be launched fast. This was both a market as well as the societal requirement. In the COVID-19 world, in different countries like United States of America (USA), United Kingdom (UK), Germany and India, attempts are being made to develop mobile applications for tracking COVID-19 patients. Emerging technologies have been altering the business landscape in most industries. The health insurance industry has also been witnessing the effects of technologies such as wearables technology, big data analytics, cloud technologies, blockchain, machine learning and such others. The advent of these technologies is fundamentally changing the health insurance industry. Given the realities of the COVID-19 world, the health insurance industry is poised at a crossroad of evolution where the industry would become data-intensive and data-driven. Health insurance firms have to enter into interfirm collaboration with wearable technology firms in the conversation on tracking social distancing from COVID-19 positive and potential cases. Health insurance firms might develop a service mechanism which could while maintaining the anonymity of COVID-19 positive or potential cases, ensure that customers who are using the wearable technology products and following social distancing norms are provided favourable premium for COVID-19 related health insurance products in case they were infected. This would be a novel addition to COVID-19 related products of health insurance firms. Deliberating on these aspects in this article, the authors propose a fundamental shift in the strategic orientation of health insurance firms.

show abstract

“…As an important data mining task, clustering is useful for exploring patterns in GTS by assigning similar data elements into the same cluster and dissimilar elements into different ones [7,8]. As a result, it provides both an overview of data at cluster levels and investigation of details on single clusters [9,10].…”

Section: Introductionmentioning

confidence: 99%

Tri-Clustering Based Exploration of Temporal Resolution Impacts on Spatio-Temporal Clusters in Geo-Referenced Time Series

Zheng

2020

IJGI

View full text Add to dashboard Cite

Unprecedented amounts of spatio-temporal data instigates an urgent need for patterns exploration in it. Clustering analysis is useful in extracting patterns from big data by grouping similar data elements into clusters. Compared with one-way clustering and co-clustering methods, tri-clustering methods are more capable of exploring complex patterns. However, the explored patterns or clusters could be different due to varying temporal resolutions of input data. This study presents a tri-clustering based method to explore the impacts of different temporal resolutions on spatio-temporal clusters identified in geo-referenced time series (GTS), one type of spatio-temporal data. Dutch daily temperature data at 28 stations over 20 years was used to illustrate this study. The temperature data at daily, monthly, and yearly resolutions were subjected to the Bregman cube average tri-clustering algorithm with I-divergence (BCAT_I) to detect spatio-temporal clusters, which were then compared in terms of patterns exhibited, compositions, and changed elements. Results confirm the temporal resolution impacts on the spatio-temporal clusters identified in the Dutch temperature data: most compositions of clusters are varying when changing the temporal resolutions of input data in the GTS. Nevertheless, there is almost no change of elements in certain clusters (12 stations in the northeast of the country; years 1996, 2010) at all temporal resolutions, suggesting them as the “true” clusters in the case study dataset.

show abstract

A Cluster-Based Machine Learning Ensemble Approach for Geospatial Data: Estimation of Health Insurance Status in Missouri

Cited by 12 publications

References 16 publications

Extending cluster-based ensemble learning through synthetic population generation for modeling disparities in health insurance coverage across Missouri

Extending cluster-based ensemble learning through synthetic population generation for modeling disparities in health insurance coverage across Missouri

The Changing Narrative in the Health Insurance Industry: Wearables Technology in Health Insurance Products and Services for the COVID-19 World

Tri-Clustering Based Exploration of Temporal Resolution Impacts on Spatio-Temporal Clusters in Geo-Referenced Time Series

Contact Info

Product

Resources

About