Open issues for partitioning clustering methods: an overview

Barioni, Maria Camila Nardini; Razente, Humberto Luiz; Marcelino, Alessandra M. R.; Traina, Agma J. M.; Traina, Caetano

doi:10.1002/widm.1127

Cited by 16 publications

(9 citation statements)

References 93 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Clustering is an unsupervised approach of machine learning, and it groups similar objects into a cluster. The most representative clustering algorithm is partitional clustering such as k-means and k-medoids [27], and each cluster has a center called centroid in partitional clustering. Mei and Chen [28] proposed a clustering around weighted prototypes (CAWP) based on new cluster representation method, where each cluster was represented by multiple objects with various weights.…”

Section: Clustering Algorithmmentioning

confidence: 99%

A Novel Text Clustering Approach Using Deep-Learning Vocabulary Network

Zhang

Zhao

et al. 2017

Mathematical Problems in Engineering

View full text Add to dashboard Cite

Text clustering is an effective approach to collect and organize text documents into meaningful groups for mining valuable information on the Internet. However, there exist some issues to tackle such as feature extraction and data dimension reduction. To overcome these problems, we present a novel approach named deep-learning vocabulary network. The vocabulary network is constructed based on related-word set, which contains the “cooccurrence” relations of words or terms. We replace term frequency in feature vectors with the “importance” of words in terms of vocabulary network and PageRank, which can generate more precise feature vectors to represent the meaning of text clustering. Furthermore, sparse-group deep belief network is proposed to reduce the dimensionality of feature vectors, and we introduce coverage rate for similarity measure in Single-Pass clustering. To verify the effectiveness of our work, we compare the approach to the representative algorithms, and experimental results show that feature vectors in terms of deep-learning vocabulary network have better clustering performance.

show abstract

Section: Clustering Algorithmmentioning

confidence: 99%

A Novel Text Clustering Approach Using Deep-Learning Vocabulary Network

Zhang

Zhao

et al. 2017

Mathematical Problems in Engineering

View full text Add to dashboard Cite

show abstract

“…One of the most concerned issues of partitioning clustering methods is coping with very large datasets. Various methods were proposed to use for dealing with very large dataset clustering such as dataset size reduction, using representative samples, parallelization, and better initial center selection [2,3,4]. However, these methods can not completely solve the problem "process masses of heterogeneous data within a limited time" [5].…”

Section: Introductionmentioning

confidence: 99%

Fast K-Means Clustering for Very Large Datasets Based on MapReduce Combined with a New Cutting Method

Hieu

Meesad

2015

Advances in Intelligent Systems and Computing

View full text Add to dashboard Cite

Abstract.Clustering very large datasets is a challenging problem for data mining and processing. MapReduce is considered as a powerful programming framework which significantly reduces executing time by dividing a job into several tasks and executes them in a distributed environment. K-Means which is one of the most used clustering methods and K-Means based on MapReduce is considered as an advanced solution for very large dataset clustering. However, the executing time is still an obstacle due to the increasing number of iterations when there is an increase of dataset size and number of clusters. This paper presents a new approach for reducing the number of iterations of K-Means algorithm which can be applied to very large dataset clustering. This new method can reduce up to 30 percent of iterations while maintaining up to 98 percent accuracy when tested with several very large datasets with real data type attributes. Based on the significant results from the experiments, this paper proposes a new fast K-Means clustering method for very large datasets based on MapReduce combined with a new cutting method (abbreviated to FMR.K-Means).

show abstract

“…In order to tackle these issues, different research fields have considered the use of strategies that allow interfering somehow in the clustering process guiding it to a desirable or more suitable data partition [1]. Among them there is semi-supervised (or constrained) clustering.…”

Section: Introductionmentioning

confidence: 99%

Semi-supervised clustering using multi-assistant-prototypes to represent each cluster

Silva

Barioni

Amo

et al. 2015

Proceedings of the 30th Annual ACM Symposium on Applied Computing

Self Cite

View full text Add to dashboard Cite

The incorporation of semi-supervision in the cluster detection process has proved especially useful when one wants to get a high consistency between the data partitioning and the knowledge the user has about the data domain. In recent years, several strategies for semi-supervised clustering have been proposed. The approaches adopted by these strategies aim at guiding the process of cluster detection by using constraints: to interfere with the allocation of elements to the most appropriate cluster at each iteration of the algorithm; or to modify the objective function employed. This paper 1 proposes a novel approach for incorporating semi-supervision in the well-known k-means algorithm. This semi-supervised clustering method employs constraint information in the definition of multiple assistant representatives for the centroids used at each iteration of k-means. A refinement process is designed to reduce the number of assistant representatives considered for each centroid without losing the clustering quality. The experimental results with eight synthetic datasets show the potential of the proposed approach for dealing with complex data structures composed by clusters of different shapes.

show abstract

Open issues for partitioning clustering methods: an overview

Cited by 16 publications

References 93 publications

A Novel Text Clustering Approach Using Deep-Learning Vocabulary Network

A Novel Text Clustering Approach Using Deep-Learning Vocabulary Network

Fast K-Means Clustering for Very Large Datasets Based on MapReduce Combined with a New Cutting Method

Semi-supervised clustering using multi-assistant-prototypes to represent each cluster

Contact Info

Product

Resources

About