Comparison between Standard K-Mean Clustering and Improved K-Mean Clustering

Pandey, Pooja; Singh, Ishpreet

doi:10.5120/ijca2016910868

Cited by 8 publications

(7 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…It is useful for discretizing continuous variables because it computes a continuous distance-based similarity measure to cluster data points [ 69 ]. It originates from signal processing aimed at partitioning and observing k clusters in which each observation is the cluster that has the nearest mean, which serves as the cluster’s prototype [ 70 ]. The discretization strategy for input data occurs via the use of the maximum and minimum dataset values, computed cluster centers, and the midpoints between every two clusters.…”

Section: Methodsmentioning

confidence: 99%

Assessment of the organizational factors in incident management practices in healthcare: A tree augmented Naive Bayes model

Albreiki,

Simsekler,

Qazi

et al. 2024

PLoS ONE

View full text Add to dashboard Cite

Despite the exponential transformation occurring in the healthcare industry, operational failures pose significant challenges in the delivery of safe and efficient care. Incident management plays a crucial role in mitigating these challenges; however, it encounters limitations due to organizational factors within complex and dynamic healthcare systems. Further, there are limited studies examining the interdependencies and relative importance of these factors in the context of incident management practices. To address this gap, this study utilized aggregate-level hospital data to explore the influence of organizational factors on incident management practices. Employing a Bayesian Belief Network (BBN) structural learning algorithm, Tree Augmented Naive (TAN), this study assessed the probabilistic relationships, represented graphically, between organizational factors and incident management. Significantly, the model highlighted the critical roles of morale and staff engagement in influencing incident management practices within organizations. This study enhances our understanding of the importance of organizational factors in incident management, providing valuable insights for healthcare managers to effectively prioritize and allocate resources for continuous quality improvement efforts.

show abstract

Section: Methodsmentioning

confidence: 99%

Assessment of the organizational factors in incident management practices in healthcare: A tree augmented Naive Bayes model

Albreiki,

Simsekler,

Qazi

et al. 2024

PLoS ONE

View full text Add to dashboard Cite

show abstract

“…A data point is assigned to the cluster which its centroid is the closest to that data point. After that, it computes a new centroid for each cluster until the best centroid is discovered [25]. Like KMeans, BIRCH is driven by a predefined number of clusters.…”

Section: Clustering Algorithmmentioning

confidence: 99%

ML-Augmented Automation for Recovering Links Between Pull-Requests and Issues on GitHub

Alshara

Salman

Shatnawi³

et al. 2023

IEEE Access

View full text Add to dashboard Cite

GitHub provides a distributed and collaborative platform to develop and maintain opensource projects. This social coding platform achieves this collaborative development, with or without coordination, using pull requests and issues artefacts. When the number of daily submitted issues rapidly grows up, especially in popular repositories, managing issues becomes more complicated. To help the repository's developers in issues processing, there are external contributors who fix issues by submitting pull-requests. On GitHub, a pull-request is frequently linked with a submitted issue to show that a solution is in progress. Unfortunately, contributors might forget or be lazy to link the Pull-Requests with their corresponding Issues. Only a tiny share of these links are established, but a large portion of links are missed in the development history. However, manually recovering the links between Pull-Request and Issues from evolutionary development history is a challenging, time-consuming, and error-prone task, even for senior developers. In this article, we propose to build ML models to recover links between pull-requests and issues using two Machine Learning algorithms (KMeans and BIRCH) based on lexical and semantic weighting measurements. These models are evaluated using PI-Link ground-truth dataset. The obtained results show that pull-request and issue links can be recovered with an accuracy of 91.5% using BIRCH clustering algorithm.

show abstract

“…Each cluster has a centroid (center) and each member object in the cluster has the minimum distance to the centroid and far from other centroids in other clusters. The standard K-means algorithm uses Euclidean distance to compute the distance between each object and the centroid [50]. We propose Algorithm 3, that employs the standard application of the K-means algorithm to support our goal in this step.…”

Section: Clustering-based K-means Algorithmmentioning

confidence: 99%

Automatic Identification of Similar Pull-Requests in GitHub’s Repositories Using Machine Learning

2022

View full text Add to dashboard Cite

Context: In a social coding platform such as GitHub, a pull-request mechanism is frequently used by contributors to submit their code changes to reviewers of a given repository. In general, these code changes are either to add a new feature or to fix an existing bug. However, this mechanism is distributed and allows different contributors to submit unintentionally similar pull-requests that perform similar development activities. Similar pull-requests may be submitted to review in parallel time by different reviewers. This will cause redundant reviewing time and efforts. Moreover, it will complicate the collaboration process. Objective: Therefore, it is useful to assign similar pull-requests to the same reviewer to be able to decide which pull-request to choose in effective time and effort. In this article, we propose to group similar pull-requests together into clusters so that each cluster is assigned to the same reviewer or the same reviewing team. This proposal allows saving reviewing efforts and time. Method: To do so, we first extract descriptive textual information from pull-requests content to link similar pull-requests together. Then, we employ the extracted information to find similarities among pull-requests. Finally, machine learning algorithms (K-Means clustering and agglomeration hierarchical clustering algorithms) are used to group similar pull-requests together. Results: To validate our proposal, we have applied it to twenty popular repositories from public dataset. The experimental results show that the proposed approach achieved promising results according to the well-known metrics in this subject: precision and recall. Furthermore, it helps to save the reviewer time and effort. Conclusion: According to the obtained results, the K-Means algorithm achieves 94% and 91% average precision and recall values over all considered repositories, respectively, while agglomeration hierarchical clustering performs 93% and 98% average precision and recall values over all considered repositories, respectively. Moreover, the proposed approach saves reviewing time and effort on average between (67% and 91%) by K-Means algorithm and between (67% and 83%) by agglomeration hierarchical clustering algorithm.

show abstract

Comparison between Standard K-Mean Clustering and Improved K-Mean Clustering

Cited by 8 publications

References 5 publications

Assessment of the organizational factors in incident management practices in healthcare: A tree augmented Naive Bayes model

Assessment of the organizational factors in incident management practices in healthcare: A tree augmented Naive Bayes model

ML-Augmented Automation for Recovering Links Between Pull-Requests and Issues on GitHub

Automatic Identification of Similar Pull-Requests in GitHub’s Repositories Using Machine Learning

Contact Info

Product

Resources

About