The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2011
DOI: 10.1145/2020408.2020515
|View full text |Cite
|
Sign up to set email alerts
|

Fast clustering using MapReduce

Abstract: Clustering problems have numerous applications and are becoming more challenging as the size of the data increases. In this paper, we consider designing clustering algorithms that can be used in MapReduce, the most popular programming environment for processing large datasets. We focus on the practical and popular clustering problems, k-center and k-median. We develop fast clustering algorithms with constant factor approximation guarantees. From a theoretical perspective, we give the first analysis that shows … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
134
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 169 publications
(137 citation statements)
references
References 35 publications
3
134
0
Order By: Relevance
“…Distributed versions of clustering algorithms related to kernel k-Means, like classic k-Means [86] and k-Medians [87] have already been proposed. However, to the best of our knowledge, a distributed approach to kernel k-Means has not been proposed yet.…”
Section: B Distributed Trimmed Kernel K-means Clusteringmentioning
confidence: 99%
“…Distributed versions of clustering algorithms related to kernel k-Means, like classic k-Means [86] and k-Medians [87] have already been proposed. However, to the best of our knowledge, a distributed approach to kernel k-Means has not been proposed yet.…”
Section: B Distributed Trimmed Kernel K-means Clusteringmentioning
confidence: 99%
“…MapReduce framework is widely used for processing and managing large data sets in a distributed cluster, which has been used for numerous applications such as, document clustering, access log analysis, generating search indexes and various other data analytical operations. A host of literature is present in recent years for performing Big Data clustering using MapReduce framework [3,4,[13][14][15][16]. A modified K-means clustering algorithm based on MapReduce framework is proposed by Li et al [17] to perform clustering on large data sets.…”
Section: Background and Literature Reviewmentioning
confidence: 99%
“…Papadimitriou et al presented the distributed co-clustering framework which introduced practical approaches for distributed data preprocessing and co-clustering [11]. Ene et al proposed the fast clustering using MapReduce [12] by adopting a MapReduce sampling technique to decrease the data size. The result of this method was applied to -center andmedian algorithm.…”
Section: Related Workmentioning
confidence: 99%
“…We have discussed above, this algorithm needs too many MapReduce jobs. The research in [12] proposed a fast clustering scheme which uses the sapling technology. This paper also proved that the MapReduce-KCenter was 4 + 2 for the -center problem and MapReduce-KMedian was 10 + 3 approximation for the -median problem.…”
Section: Related Workmentioning
confidence: 99%