The conventional fuzzy C-means (FCM) is sensitive to the initial cluster centers and outliers, which may cause the centers deviate from the real centers when the algorithm converges. To improve the performance of FCM, a method of initializing the cluster centers based on probabilistic suppression is proposed and an improved local outlier factor is integrated into the model of FCM. Firstly, the probability of an object as cluster center is defined by its local density, and all initial centers are obtained by the cluster center’s probability and probability suppression function incrementally. Next, an improved local outlier factor is reconstructed according to the local distribution of an object, and its reciprocal is regarded as the contribution degree of an object to cluster center. Then, the improved local outlier factor is integrated into FCM to alleviate the negative effect caused by outliers. Finally, experiments on synthetic and real-world datasets are provided to demonstrate the clustering performance and anti-noise ability of proposed method.
Outlier detection is a hot issue in data mining, which has plenty of real-world applications. LOF (Local Outlier Factor) can capture the abnormal degree of objects in the dataset with different density levels, and many extended algorithms have been proposed in recent years. However, the LOF needs to search the nearest neighborhood of each object on the whole dataset, which greatly increases the time cost. Most of these extended algorithms only consider the distance between an object and its neighborhood, but ignore the local distribution of an object within its neighborhood, resulting in a high false-positive rate. To improve the running speed, a rough clustering based on triple fusion is proposed, which divides a dataset into several subsets and outlier detection is performed only on each subset. Then, considering the local distribution of an object within its neighborhood, a new local outlier factor is constructed to estimate the abnormal degree of each object. Finally, the experimental results indicate that the proposed algorithm has better performance and lower running time than the others.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.