Graph-based clustering methods have demonstrated the effectiveness in various applications. Generally, existing graph-based clustering methods first construct a graph to represent the input data and then partition it to generate the clustering result. However, such a stepwise manner may make the constructed graph not fit the requirements for the subsequent decomposition, leading to compromised clustering accuracy. To this end, we propose a joint learning framework, which is able to learn the graph and the clustering result simultaneously, such that the resulting graph is tailored to the clustering task. The proposed model is formulated as a well-defined nonnegative and off-diagonal constrained optimization problem, which is further efficiently solved with convergence theoretically guaranteed. The advantage of the proposed model is demonstrated by comparing with 19 state-of-the-art clustering methods on 10 datasets with 4 clustering metrics. ). 2 2. The proposed optimization method has the following theoretical guarantees: i), the constraints in the proposed model can be naturally satisfied 2 in the optimization process; ii), each optimization step can decrease the value of the objective function; and iii), the converged limit point is a stationary point that satisfies the Karush-Kuhn-Tucker (KKT) conditions.The rest of this paper is organized as follows. In section II, we discuss the related works. Section III presents the proposed model, the optimization method, its computational complexity analysis and its theoretical guarantees. Experimental comparisons and analyses are shown in section IV, and finally section V concludes this paper.
II. RELATED WORK
A. NotationThroughout this paper, matrices are denoted by boldface uppercase letters, e.g., A, and the element at the ith row and jth column of a matrix is denoted as A ij or a ij . Vectors are represented by boldface lowercase letters, e.g., a and scales are represented by italic lowercase letters, e.g., a. Moreover, T stands for the transpose of a matrix, A F = i j A 2 ij is the Frobenius norm of matrix A, A ∞ = max ij |A ij | returns the maximum absolute value of matrix A, diag(·) returns the diagonal elements of a matrix as a vector, ⊙ returns the Hadamard product of two matrices, i.e., the element-wise multiplication of two matrices, exp(·) returns the exponential value, ·, · calculates the inner product of two matrices, I k denotes an identity matrix of size k × k, and A ≥ 0 means each element of A is greater than or equal to 0, i,e., A ij ≥ 0, ∀i, j. X = {x 1 , x 2 , . . . , x n } ∈ R d×n denotes the input data, x i ∈ R d×1 is the ith sample, n, d, and c represent the number of samples, the dimension of features, and the number of classes, respectively,
B. Graph-based ClusteringDifferent from traditional clustering methods (like K-means) that partition the raw features X straightforwardly, graph clustering [15], [13] transforms data clustering as a graph partition problem. Specifically, a typical graph clustering method is composed of the following steps: