Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2016
DOI: 10.1145/2939672.2939796
|View full text |Cite
|
Sign up to set email alerts
|

Communication Efficient Distributed Kernel Principal Component Analysis

Abstract: Kernel Principal Component Analysis (KPCA) is a key machine learning algorithm for extracting nonlinear features from data. In the presence of a large volume of high dimensional data collected in a distributed fashion, it becomes very costly to communicate all of this data to a single data center and then perform kernel PCA. Can we perform kernel PCA on the entire dataset in a distributed and communication efficient fashion while maintaining provable and strong guarantees in solution quality?In this paper, we … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
64
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 35 publications
(64 citation statements)
references
References 21 publications
0
64
0
Order By: Relevance
“…Second, even the implementation of simple methods is not straightforward when extremely large data sets are involved. In other words, devising and implementing a numerically efficient ‘Big Data PCA’ is a non‐trivial task (Balcan et al ., ). At least two steps must be considered: adopting an appropriate machine learning model (e.g.…”
Section: Big Data Analyticsmentioning
confidence: 97%
“…Second, even the implementation of simple methods is not straightforward when extremely large data sets are involved. In other words, devising and implementing a numerically efficient ‘Big Data PCA’ is a non‐trivial task (Balcan et al ., ). At least two steps must be considered: adopting an appropriate machine learning model (e.g.…”
Section: Big Data Analyticsmentioning
confidence: 97%
“…Besides estimation, other distributed statistical technique may be of interests, such as the distributed principal component analysis (Balcan, Kanchanapally, Liang, & Woodruff, 2014), consensus-based distributed SVMs (Forero, Cano, & Giannakis, 2010), which utilizes ADMM (Boyd et al, 2011), and so on. Distributed version of topics like nonnegative matrix factorization, as a data analysis technique, high-dimensional structured nonparametric model, which is the sparse additive model (Fan, Feng, & Song, 2011;Ravikumar, Lafferty, Liu, & Wasserman, 2009), are also of interest.…”
Section: Related Work and Open Questionsmentioning
confidence: 99%
“…With the consideration of sensor network applications, some DC methods have been proposed, such as a generic algorithm for distributed data clustering in sensor networks and the novel DKM algorithm for clustering observations collected by spatially distributed resource‐aware sensors . Recently, two K‐means‐based models, distributed PCA and K‐means and KPCA+ K‐means clustering, were developed based on the PCA concept and kernel PCA concept. Mashayekhi et al proposed GDCluster, a general fully decentralized clustering method, which is capable of clustering dynamic and distributed datasets .…”
Section: Data Mining Techniques In Distributed Environmentmentioning
confidence: 99%