Single-pass and linear-time k-means clustering based on MapReduce

Shahrivari, Saeed; Jalili, Saeed

doi:10.1016/j.is.2016.02.007

Cited by 58 publications

(25 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…To deal with large scale data, several clustering methods which are based on parallel frameworks have been designed in the literature (Bahmani et al 2012;Hadian and Shahrivari 2014;Kim et al 2014;Ludwig 2015;Shahrivari and Jalili 2016;Zhao et al 2009). Most of these methods use the MapReduce framework.…”

Section: Related Workmentioning

confidence: 99%

“…Bahmani et al have proposed a scalable k-means (Bahmani et al 2012) that extends k-means++ technique for initial seeding. Shahrivari and Jalili (2016) have proposed a single-pass and linear time MapReduce-based k-means method. Kim et al (2014) have proposed parallelizing densitybased clustering with MapReduce.…”

Section: Related Workmentioning

confidence: 99%

“…According to Shahrivari and Jalili (2016), the most memory efficient value for chunk size is √ k.n because this value for chunk size generates a set of intermediate centers with size k.n. That is to say, the chunk size should not be set to value greater than √ k.n.…”

Section: Tuning the Chunk Sizementioning

confidence: 99%

See 2 more Smart Citations

One-pass MapReduce-based clustering method for mixed large scale data

2017

View full text Add to dashboard Cite

Big data is often characterized by a huge volume and a mixed types of attributes namely, numeric and categorical. K-prototypes has been fitted into MapReduce framework and hence it has become a solution for clustering mixed large scale data. However, k-prototypes requires computing all distances between each of the cluster centers and the data points. Many of these distance computations are redundant, because data points usually stay in the same cluster after first few iterations. Also, k-prototypes is not suitable for running within MapReduce framework: the iterative nature of k-prototypes cannot be modeled through MapReduce since at each iteration of k-prototypes, the whole data set must be read and written to disks and this results a high input/output (I/O) operations. To deal with these issues, we propose a new one-pass accelerated MapReduce-based k-prototypes clustering method for mixed large scale data. The proposed method reads and writes data only once which reduces largely the I/O operations compared to existing MapReduce implementation of k-prototypes. Furthermore, the proposed method is based on a pruning strategy to accelerate the clustering process by reducing the redundant distance computations between cluster centers and data points. Experiments performed on simulated and real data sets show that the proposed method is scalable and improves the efficiency of the existing k-prototypes methods.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

One-pass MapReduce-based clustering method for mixed large scale data

2017

View full text Add to dashboard Cite

show abstract

“…K-Means algorithm similar with K-modes algorithm first select k samples as the centroid, use Europe distance as a similarity measure, for each sample, the remaining calculation to each centroid distance, and put it into the nearest centroid, finally re calculate the centroid [6,7]. Iteration until the centroid is no longer changed.…”

Section: Clustering Based Recommendation Algorithmmentioning

confidence: 99%

Personalized Music Recommendation Based on Clustering Algorithm

Zhang¹,

Wang²,

Lv³

et al. 2017

Intelligent Computing and Information Engineering (ICIE)

View full text Add to dashboard Cite

Aiming at the recommendation of music field, this paper proposes a music recommendation algorithm based on attribute selection and application clustering. Firstly, the recommended progress of music conduct in-depth analysis to focus on building properties of music and the problems of interaction in the field of music recommendation. With clustering algorithm as the main method, more accurate clustering will make the recommendation more precise. Based on attribute building and clustering, the overall recommendation scheme is designed, and the music is clustered by attribute judgment. Experimental results show that music recommendation proposed algorithm has a better recommendation effect, can effectively improve the user experience.

show abstract

“…Therefore, it is important to study the graph mining algorithm based on disk, or a graph mining algorithm based on some parallel processing model, such as DNA model [19], MapReduce [20], etc..…”

Section: The Challenge Of Graph Data Miningmentioning

confidence: 99%