The Principal Components Analysis of a Graph, and Its Relationships to Spectral Clustering

Saerens, Marco; Fouss, François; Yen, Luh; Dupont, Pierre

doi:10.1007/978-3-540-30115-8_35

Cited by 156 publications

(170 citation statements)

References 21 publications

Supporting

Mentioning

168

Contrasting

Unclassified

Order By: Relevance

“…A straightforward means of computing them is to solve the linear system (I − αA)x = e j and (L + 1 n ee T )y = e i − e j . Then [19]). Solving these linear systems is an effective method to compute only the pairwise scores.…”

Section: Algorithms For Pairwise Scorementioning

confidence: 98%

See 1 more Smart Citation

Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks

Esfandiar

Bonchi

Gleich

et al. 2010

Algorithms and Models for the Web-Graph

View full text Add to dashboard Cite

Abstract. Motivated by social network data mining problems such as link prediction and collaborative filtering, significant research effort has been devoted to computing topological measures including the Katz score and the commute time. Existing approaches typically approximate all pairwise relationships simultaneously. In this paper, we are interested in computing: the score for a single pair of nodes, and the top-k nodes with the best scores from a given source node. For the pairwise problem, we apply an iterative algorithm that computes upper and lower bounds for the measures we seek. This algorithm exploits a relationship between the Lanczos process and a quadrature rule. For the top-k problem, we propose an algorithm that only accesses a small portion of the graph and is related to techniques used in personalized PageRank computing. To test the scalability and accuracy of our algorithms we experiment with three real-world networks and find that these algorithms run in milliseconds to seconds without any preprocessing.

show abstract

Section: Algorithms For Pairwise Scorementioning

confidence: 98%

“…Other uses of Katz scores and commute time are anomalous link detection [18], recommendation [20], and clustering [19].…”

Section: Introductionmentioning

confidence: 99%

Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks

Esfandiar

Bonchi

Gleich

et al. 2010

Algorithms and Models for the Web-Graph

View full text Add to dashboard Cite

show abstract

“…Spectral clustering can also be understood in terms of the spectral embedding of the graph, the change of representation of the data represented by nodes. Indeed, the spectral decomposition of the graph Laplacian gives a projection of the data in a new feature space in which Euclidean distance corresponds to a similarity given by the graph (e.g., the resistance distance [15,27]). …”

Section: Introductionmentioning

confidence: 99%

Fast Gaussian Pairwise Constrained Spectral Clustering

Chatel

Pelletier

Tommasi

2014

Machine Learning and Knowledge Discovery in Databases

View full text Add to dashboard Cite

Abstract. We consider the problem of spectral clustering with partial supervision in the form of must-link and cannot-link constraints. Such pairwise constraints are common in problems like coreference resolution in natural language processing. The approach developed in this paper is to learn a new representation space for the data together with a distance in this new space. The representation space is obtained through a constraint-driven linear transformation of a spectral embedding of the data. Constraints are expressed with a Gaussian function that locally reweights the similarities in the projected space. A global, non-convex optimization objective is then derived and the model is learned via gradient descent techniques. Our algorithm is evaluated on standard datasets and compared with state of the art algorithms, like [14,18,31]. Results on these datasets, as well on the CoNLL-2012 coreference resolution shared task dataset, show that our algorithm significantly outperforms related approaches and is also much more scalable.

show abstract

“…In recent years various spectral methods to perform these tasks, based on the eigenvectors of adjacency matrices of graphs on the data have been developed, see for example [1][2][3][4][5][6][7][8][9][10][11][12] and references therein. In the simplest version, known as the normalized graph Laplacian, given n data points {x i } n i=1 where each x i ∈ R p (or some other normed vector space), we define a pairwise similarity matrix between points, for example using a Gaussian kernel with width σ 2 ,…”

Section: Introductionmentioning

confidence: 99%

“…A different theoretical analysis of the eigenvectors of the matrix M , based on the fact that M is a stochastic matrix representing a random walk on the graph was described by Meilǎ and Shi [14], who considered the case of piecewise constant eigenvectors for specific lumpable matrix structures. Additional notable works that considered the random walk aspects of spectral clustering are [10,15], where the authors suggest clustering based on the average commute time between points, [16,17] which considered the relaxation process of this random walk, and [18,19] which suggested random walk based agglomerative clustering algorithms.…”

Section: Introductionmentioning

confidence: 99%

Diffusion Maps - a Probabilistic Interpretation for Spectral Embedding and Clustering Algorithms

Nadler

Lafon

Coifman

et al. 2008

Lecture Notes in Computational Science and Enginee

View full text Add to dashboard Cite

Summary. Spectral embedding and spectral clustering are common methods for non-linear dimensionality reduction and clustering of complex high dimensional datasets. In this paper we provide a diffusion based probabilistic analysis of algorithms that use the normalized graph Laplacian. Given the pairwise adjacency matrix of all points in a dataset, we define a random walk on the graph of points and a diffusion distance between any two points. We show that the diffusion distance is equal to the Euclidean distance in the embedded space with all eigenvectors of the normalized graph Laplacian. This identity shows that characteristic relaxation times and processes of the random walk on the graph are the key concept that governs the properties of these spectral clustering and spectral embedding algorithms. Specifically, for spectral clustering to succeed, a necessary condition is that the mean exit times from each cluster need to be significantly larger than the largest (slowest) of all relaxation times inside all of the individual clusters. For complex, multiscale data, this condition may not hold and multiscale methods need to be developed to handle such situations.

show abstract

The Principal Components Analysis of a Graph, and Its Relationships to Spectral Clustering

Cited by 156 publications

References 21 publications

Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks

Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks

Fast Gaussian Pairwise Constrained Spectral Clustering

Diffusion Maps - a Probabilistic Interpretation for Spectral Embedding and Clustering Algorithms

Contact Info

Product

Resources

About