2013
DOI: 10.1186/2192-113x-2-18
|View full text |Cite
|
Sign up to set email alerts
|

Efficient parallel spectral clustering algorithm design for large data sets under cloud computing environment

Abstract: Spectral clustering algorithm has proved be more effective than most traditional algorithms in finding clusters. However, its high computational complexity limits its effect in actual application. This paper combines the spectral clustering with MapReduce, through evaluation of sparse matrix eigenvalue and computation of distributed cluster, puts forward the improvement ideas and concrete realization, and thus improves the clustering speed of the distinctive clustering algorithm. According to the experiment, w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 19 publications
(8 citation statements)
references
References 8 publications
0
8
0
Order By: Relevance
“…Similarly, several real-time cloud data mining frameworks, algorithms, and services are available that provide information through a number of applications [44]. These real-time cloud data mining and load balancing applications are VM tasks classification for load balancing, feature extraction, anomaly, and intrusion detection, open shop scheduling, attribute importance, spatial classifications, data analysis and satellite imagery [45], spectral and statistical data analysis [46], gene expression data mining and bioinformatics [47], geo-spatial analysis and geoinformatics, large-scale mining in big data and web mining [48], machine-learning applications [49], high-dimensional data mining [50], highly diversified and dense data mining in rule mining [51], the security of data in the cloud, clustering, datacenters resources optimization in the cloud, noise removal, reactive power problem, face recognition, biomedical image processing, teaching based learning, manufacturing design, water resource problem and routing optimization. This research is mainly focused on classification, specifically the combination of classifier with a load balancer in the cloud.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Similarly, several real-time cloud data mining frameworks, algorithms, and services are available that provide information through a number of applications [44]. These real-time cloud data mining and load balancing applications are VM tasks classification for load balancing, feature extraction, anomaly, and intrusion detection, open shop scheduling, attribute importance, spatial classifications, data analysis and satellite imagery [45], spectral and statistical data analysis [46], gene expression data mining and bioinformatics [47], geo-spatial analysis and geoinformatics, large-scale mining in big data and web mining [48], machine-learning applications [49], high-dimensional data mining [50], highly diversified and dense data mining in rule mining [51], the security of data in the cloud, clustering, datacenters resources optimization in the cloud, noise removal, reactive power problem, face recognition, biomedical image processing, teaching based learning, manufacturing design, water resource problem and routing optimization. This research is mainly focused on classification, specifically the combination of classifier with a load balancer in the cloud.…”
Section: Literature Reviewmentioning
confidence: 99%
“…(2)Analysis of scalability According to the paper [Ran Jin,37], the formula is η=S p /N,wherein, S p represents the speedup ratio, N means the number of cluster nodes. Fig.…”
Section: (1)test Of Speedup Ratiomentioning
confidence: 99%
“…Jin et al proposed a distributed version of the spectral clustering algorithm using the Map Reduce Hadoop framework . The authors parallelized the similarity matrix computation and the subsequent inference of the dominant eigenvalues from the similarity matrix.…”
Section: Related Workmentioning
confidence: 99%