In most spectral clustering approaches, the Gaussian kernel-based similarity measure is used to construct the affinity matrix. However, such a similarity measure does not work well on a dataset with a nonlinear and elongated structure. In this paper, we present a new similarity measure to deal with the nonlinearity issue. The maximum flow between data points is computed as the new similarity, which can satisfy the requirement for similarity in the clustering method. Additionally, the new similarity carries the global and local relations between data. We apply it to spectral clustering and compare the proposed similarity measure with other state-of-the-art methods on both synthetic and real-world data. The experiment results show the superiority of the new similarity: 1) The max-flow-based similarity measure can significantly improve the performance of spectral clustering; 2) It is robust and not sensitive to the parameters.Keywords: Spectral clustering, maximum flow, affinity graph, similarity measure. Manuscript received July 31, 2012; revised Oct. 7, 2012; accepted Oct. 22, 2012. This work was supported by the National Natural Science Foundation of China through the program 61173083, by the Ministry of Science and Technology, China, through the 973 Program 2011CB302200 and by the Economic & Information Commission of Guangdong province through the Program GDIID2008IS007.Jiangzhong Cao (phone: +86 135 6008 2826, cjz510@gdut.edu.cn) is with the School of Information Science and Technology, Sun Yat-sen University, Guangzhou, China, and also with the School of Information Engineering, Guangdong University of Technology, Guangzhou, China.Pei Chen (chenpei@mail.sysu.edu.cn) and Yun Zheng (zhengyun84@gmail.com) are with the School of Information Science and Technology, Sun Yat-sen University, Guangzhou, China.Qingyun Dai (daiqy@gdut.edu.cn) is with the School of Information Engineering, Guangdong University of Technology, Guangzhou, China.http://dx.doi.org/10.4218/etrij.13.0112.0520
I. IntroductionSpectral clustering has attracted a significant amount of attention [1]-[4] due to its impressive performance on some challenging clustering datasets, with successful applications in computer vision [5], [6], VLSI design [7], and speech processing [8], [9]. It has been shown that the affinity matrix is crucial to the performance of spectral clustering [10]- [16]. Most spectral clustering methods adopted the Gaussian kernel function as a similarity measure to construct the affinity matrix [5], [11]-[13], where only the parameters are different. In [11], a fixed scaling parameter controls how fast the similarity falls off with the distance between points. In [12], a self-tuning parameter was used to adapt to the multiscale dataset. In [13], the Gaussian kernel function was scaled according to the local density between data points so that the similarity between two points is higher if there are more common points in their ε-neighborhood.Though the Gaussian kernel-based similarity measure can describe the information of the loc...