A K-Means Approach to Clustering Disease Progressions

Luong, Duc Thanh Anh; Chandola, Varun

doi:10.1109/ichi.2017.18

Cited by 13 publications

(5 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…That is, our goal is not to cluster based on the behavior sequence per se, as papers [31] on unsupervised sequence clustering do. A large literature of deep clustering [26,33,35], time series clustering [9,27], and progression of diseases [25,28] do not address the research questions we tackle. All these works address only the discovery of user segments, but none incorporates the delivery.…”

Section: Related Workmentioning

confidence: 99%

Delivery Optimized Discovery in Behavioral User Segmentation under Budget Constraint

Chopra,

Sinha,

Choudhary

et al. 2023

Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

View full text Add to dashboard Cite

Users' behavioral footprints online enable firms to discover behaviorbased user segments (or, segments) and deliver segment specific messages to users. Following the discovery of segments, delivery of messages to users through preferred media channels like Facebook and Google can be challenging, as only a portion of users in a behavior segment find match in a medium, and only a fraction of those matched actually see the message (exposure). Even high quality discovery becomes futile when delivery fails. Many sophisticated algorithms exist for discovering behavioral segments; however, these ignore the delivery component. The problem is compounded because (i) the discovery is performed on the behavior data space in firms' data (e.g., user clicks), while the delivery is predicated on the static data space (e.g., geo, age) as defined by media; and (ii) firms work under budget constraint. We introduce a stochastic optimization based algorithm for delivery optimized discovery of behavioral user segments and offer new metrics to address the joint optimization. We leverage optimization under a budget constraint for delivery combined with a learning-based component for discovery. Extensive experiments on a public dataset from Google and a proprietary dataset show the effectiveness of our approach by simultaneously improving delivery metrics, reducing budget spend and achieving strong predictive performance in discovery.

show abstract

Section: Related Workmentioning

confidence: 99%

Delivery Optimized Discovery in Behavioral User Segmentation under Budget Constraint

Chopra,

Sinha,

Choudhary

et al. 2023

Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

View full text Add to dashboard Cite

show abstract

“…This is achieved through the analysis of e-health records, physiology-based time series data, and genomics [57][58][59]. Both computational molecular medicine and computational healthcare utilize a large number of high-dimensional datasets in their models [60][61][62]. The sparse and redundant nature of high-dimensional datasets poses great challenges for data scientists in data mining [63].…”

Section: High-dimensional Datasetsmentioning

confidence: 99%

K-Hyperparameter Tuning in High-Dimensional Space Clustering: Solving Smooth Elbow Challenges Using an Ensemble Based Technique of a Self-Adapting Autoencoder and Internal Validation Indexes

Gikera,

Mwaura,

Muuro

et al. 2023

Journal on Artificial Intelligence

View full text Add to dashboard Cite

k-means is a popular clustering algorithm because of its simplicity and scalability to handle large datasets. However, one of its setbacks is the challenge of identifying the correct k-hyperparameter value. Tuning this value correctly is critical for building effective k-means models. The use of the traditional elbow method to help identify this value has a long-standing literature. However, when using this method with certain datasets, smooth curves may appear, making it challenging to identify the k-value due to its unclear nature. On the other hand, various internal validation indexes, which are proposed as a solution to this issue, may be inconsistent. Although various techniques for solving smooth elbow challenges exist, k-hyperparameter tuning in high-dimensional spaces still remains intractable and an open research issue. In this paper, we have first reviewed the existing techniques for solving smooth elbow challenges. The identified research gaps are then utilized in the development of the new technique. The new technique, referred to as the ensemble-based technique of a self-adapting autoencoder and internal validation indexes, is then validated in high-dimensional space clustering. The optimal k-value, tuned by this technique using a voting scheme, is a trade-off between the number of clusters visualized in the autoencoder's latent space, k-value from the ensemble internal validation index score and one that generates a value of 0 or close to 0 on the derivativeat the elbow. Experimental results based on the Cochran's Q test, ANOVA, and McNemar's score indicate a relatively good performance of the newly developed technique in k-hyperparameter tuning. KEYWORDS k-hyperparameter tuning; high-dimensional; smooth elbow K-Means ArchitectureK-means is an iterative algorithm that aims to partition a dataset into a set of k non-overlapping groups of data points [9]. The k-hyperparameter is one of the most important hyperparameters to tune in k-means [10,11]. Tuning a machine learning model's hyperparameters has a significant effect on its performance [12]. In this subsection, we explore k-clusters, k-hyperparameters, and the traditional elbow method used to identify the optimal value for the k-hyperparameter. K-Means ClustersThe k-means clusters are the resulting data sub-groups generated by the popular unsupervised partitioning algorithm, known as the k-means clustering algorithm [13]. Cluster analysis using the k-means algorithm is an example of a k-means-based model that has been successfully applied in cluster analysis [14,15]. The k-clusters generated by these algorithms are usually composed of distinct non-overlapping groups of data points, aggregated together because they share specific similarities [16]. The data points within a particular cluster are similar, while the data points across different clusters are different [17]. Both the intra-cluster and inter-cluster data points are measured using a sum of squared distance-based metric [18,19]. For this reason, the original un-partitioned dataset is stan...

show abstract

“…Although there is a rich literature on different prediction tasks in the context of AKI [39], this section is focused on those that primarily attempted to predict the occurrence of AKI. As a result, studies such as those predicting the progression of various stages of AKI [40] or prediction of AKI mortality [10] were excluded.…”

Section: Acute Kidney Injury Predictionmentioning

confidence: 99%

Temporal Pattern Detection to Predict Adverse Events in Critical Care: Case Study With Acute Kidney Injury

Morid¹,

Sheng²,

Fiol³

et al. 2020

JMIR Med Inform

View full text Add to dashboard Cite

Background More than 20% of patients admitted to the intensive care unit (ICU) develop an adverse event (AE). No previous study has leveraged patients’ data to extract the temporal features using their structural temporal patterns, that is, trends. Objective This study aimed to improve AE prediction methods by using structural temporal pattern detection that captures global and local temporal trends and to demonstrate these improvements in the detection of acute kidney injury (AKI). Methods Using the Medical Information Mart for Intensive Care dataset, containing 22,542 patients, we extracted both global and local trends using structural pattern detection methods to predict AKI (ie, binary prediction). Classifiers were built on 17 input features consisting of vital signs and laboratory test results using state-of-the-art models; the optimal classifier was selected for comparisons with previous approaches. The classifier with structural pattern detection features was compared with two baseline classifiers that used different temporal feature extraction approaches commonly used in the literature: (1) symbolic temporal pattern detection, which is the most common approach for multivariate time series classification; and (2) the last recorded value before the prediction point, which is the most common approach to extract temporal data in the AKI prediction literature. Moreover, we assessed the individual contribution of global and local trends. Classifier performance was measured in terms of accuracy (primary outcome), area under the curve, and F-measure. For all experiments, we employed 20-fold cross-validation. Results Random forest was the best classifier using structural temporal pattern detection. The accuracy of the classifier with local and global trend features was significantly higher than that while using symbolic temporal pattern detection and the last recorded value (81.3% vs 70.6% vs 58.1%; P<.001). Excluding local or global features reduced the accuracy to 74.4% or 78.1%, respectively (P<.001). Conclusions Classifiers using features obtained from structural temporal pattern detection significantly improved the prediction of AKI onset in ICU patients over two baselines based on common previous approaches. The proposed method is a generalizable approach to predict AEs in critical care that may be used to help clinicians intervene in a timely manner to prevent or mitigate AEs.

show abstract

A K-Means Approach to Clustering Disease Progressions

Cited by 13 publications

References 23 publications

Delivery Optimized Discovery in Behavioral User Segmentation under Budget Constraint

Delivery Optimized Discovery in Behavioral User Segmentation under Budget Constraint

K-Hyperparameter Tuning in High-Dimensional Space Clustering: Solving Smooth Elbow Challenges Using an Ensemble Based Technique of a Self-Adapting Autoencoder and Internal Validation Indexes

Temporal Pattern Detection to Predict Adverse Events in Critical Care: Case Study With Acute Kidney Injury

Contact Info

Product

Resources

About