Abstract-In fuzzy clustering soft cluster partitions are formed based on the similarity of data points to the respective cluster prototypes. Similarity is defined in terms of simultaneous closeness regarding all attributes. In some applications the values of many attributes have been measured, but a natural clustering, if it exists, occurs within a (small) subset of attributes. The remaining dimensions can be considered irrelevant. They can obscure an existing grouping and make it harder to discover the cluster structure. In probabilistic fuzzy clustering irrelevant attributes can lead to coincidental cluster centers in the worst case. We study this effect in detail as well as the robustness of different similarity functions and their possible parameterizations against irrelevant input dimensions. Empirical evidence is given for the different properties of the membership functions.
I. FUZZY CLUSTERINGMost fuzzy clustering algorithms are objective function based: they determine an optimal (fuzzy) partition of a given data set X = { x j | j = 1, . . . , n} into clusters by minimizing an objective function(1) subject to the constraintsfor all i ∈ {1, . . . , c}, and (2)Here u ij ∈ [0, 1] is the membership degree of datum x j to cluster i and d ij is the distance between datum x j and cluster i. The c × n matrix U = (u ij ) is called the fuzzy partition matrix and C describes the set of clusters by stating location parameters (i.e. the cluster center) and maybe size and shape parameters for each cluster. The parameter m, m > 1, is called the fuzzifier or weighting exponent. It determines the "fuzziness" of the classification: with higher values for m the boundaries between the clusters become softer, with lower values they get harder. Usually m = 2 is chosen. Constraint (2) guarantees that no cluster is empty. Constraint (3) ensures that the membership degrees of a datum to the clusters sum up to 1 and thus that each datum has the same total influence. Because of the second constraint this approach is usually called probabilistic fuzzy clustering, since with it the membership degrees for a datum formally resemble the probabilities of its being a member of the corresponding clusters. The partitioning property of a probabilistic clustering algorithm, which "distributes" the weight of a datum to the different clusters, is due to this constraint.Unfortunately, the objective function J cannot be minimized directly. Therefore an iterative algorithm is used, which alternately optimizes the membership degrees and the cluster parameters. That is, first the membership degrees are optimized for fixed cluster parameters, then the cluster parameters are optimized for fixed membership degrees. The main advantage of this scheme is that in each of the two steps the optimum can be computed directly. By iterating the two steps the joint optimum is approached (although it cannot be guaranteed that the global optimum will be reached-the algorithm may get stuck in a local minimum of the objective function J).The update formulae are deri...