2021
DOI: 10.6339/21-jds996
|View full text |Cite
|
Sign up to set email alerts
|

Hybrid Density- and Partition-Based Clustering Algorithm for Data With Mixed-Type Variables

Abstract: Clustering is an essential technique for discovering patterns in data. Many clustering algorithms have been developed to tackle the ever increasing quantity and complexity of data, yet algorithms that can cluster data with mixed variables (continuous and categorical) remain limited despite the abundance of mixed-type data. Of the existing clustering methods for mixed data types, some posit unverifiable distributional assumptions or rest on unbalanced contributions of different variable types. To address these … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6

Relationship

2
4

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 20 publications
0
6
0
Order By: Relevance
“…First, to determine whether the results were sensitive to the matching method, we performed three sensitivity analyses: 1) we estimated the effect of metformin on 90-day mortality and AKI across strata using propensity score stratification and using weighting on the propensity scores by inverse propensity score and standardized mortality ratio weighting methods (see details in eMethods, http://links.lww.com/CCM/H40); 2) we estimated the association between exposure to metformin and 90-day mortality in the entire population using multivariable logistic regression adjusted by multiple covariates (see details in eMethods, http://links.lww.com/CCM/H40); and 3) we estimated the potential effect of residual confounding by calculating E-values. Second, to investigate the effect of the healthy user bias on 90-day mortality, we clustered patients in different health status groups using consensus k-means agglomerative algorithms (28) based on 43 clinical and laboratory covariates ( eTable 5 and eMethods, http://links.lww.com/CCM/H40). Differences in health status between clusters were analyzed and represented using alluvial plots for quartiles of age, Charlson index, hemoglobin A1c, reference creatinine, and Acute Physiology and Chronic Health Evaluation III score.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…First, to determine whether the results were sensitive to the matching method, we performed three sensitivity analyses: 1) we estimated the effect of metformin on 90-day mortality and AKI across strata using propensity score stratification and using weighting on the propensity scores by inverse propensity score and standardized mortality ratio weighting methods (see details in eMethods, http://links.lww.com/CCM/H40); 2) we estimated the association between exposure to metformin and 90-day mortality in the entire population using multivariable logistic regression adjusted by multiple covariates (see details in eMethods, http://links.lww.com/CCM/H40); and 3) we estimated the potential effect of residual confounding by calculating E-values. Second, to investigate the effect of the healthy user bias on 90-day mortality, we clustered patients in different health status groups using consensus k-means agglomerative algorithms (28) based on 43 clinical and laboratory covariates ( eTable 5 and eMethods, http://links.lww.com/CCM/H40). Differences in health status between clusters were analyzed and represented using alluvial plots for quartiles of age, Charlson index, hemoglobin A1c, reference creatinine, and Acute Physiology and Chronic Health Evaluation III score.…”
Section: Methodsmentioning
confidence: 99%
“…lww.com/CCM/H40); and 3) we estimated the potential effect of residual confounding by calculating E-values. Second, to investigate the effect of the healthy user bias on 90-day mortality, we clustered patients in different health status groups using consensus k-means agglomerative algorithms (28) based on 43 clinical and laboratory covariates (eTable 5 and eMethods, http://links. lww.com/CCM/H40).…”
Section: Sensitivity Analysismentioning
confidence: 99%
“…In order to obtain robust clustering findings, HyDaP incorporates participant characteristics from all assessment cycles. The algorithm can also identify variables that are the most important for clustering 15 . The subject in this method was each participant at every assessment, so that participants who remained in the study longer provided more data points.…”
Section: Methodsmentioning
confidence: 99%
“…Only the publications classified as case studies within the previous process (n = 104) were included in the analysis. Since the data are mostly categorical, the distance matrix was calculated using Gower distance that measures the dissimilarity of two items with mixed numeric and non-numeric data (Wang et al, 2021). The data were categorized based on the thematic focus of the output (11 levels), target groups (10 levels), geographical area (29 levels), publication type (8 levels), source (64 levels) and year (2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019)(2020)(2021).…”
Section: Research Stagesmentioning
confidence: 99%