Abstract:In this paper, a novel Feature-Reduction Fuzzy C-means (FRFCM) with Feature Linkage Weight (FRFCM-FLW) algorithm is introduced. By the combination of FRFCM and feature linkage weight, we develop a new feature selection model, called a Feature Linkage Weight Based FRFCM using fuzzy clustering. The larger amounts of features are superior to the complication of the problem, and the larger the time that is exhausted in creating the outcome of the classifier or the model. Feature selection has been established as a… Show more
“…The total number of Samples (30) [14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29]} P2{[0, 2,4,6,8,10,12,14,16,18,20,22,24,26,28], [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29]} P3{[0, 1,2,3,4,5,6,7,8,…”
Section: All Predictions Correctly Number Of Samplesmentioning
confidence: 99%
“…In order to make clustering widely available in more fields, it can be applied to large-scale group decision-making [8,9]. Existing clustering algorithms mainly include hard clustering [10,11] and fuzzy clustering [12][13][14]. The former has only two membership degrees, 0 and 1, that is, each data object is strictly divided into a certain cluster; The mem-bership of the latter can have any values within the interval [0,1], that is, a data object can be classified into multiple clusters with different membership.…”
In fuzzy clustering algorithms, the possibilistic fuzzy clustering algorithm has been widely used in many fields. However, the traditional Euclidean distance cannot measure the similarity between samples well in high-dimensional data. Moreover, if there is an overlap between clusters or a strong correlation between features, clustering accuracy will be easily affected. To overcome the above problems, a collaborative possibilistic fuzzy clustering algorithm based on information bottleneck is proposed in this paper. This algorithm retains the advantages of the original algorithm, on the one hand, using mutual information loss as the similarity measure instead of Euclidean distance, which is conducive to reducing subjective errors caused by arbitrary choices of similarity measures and improving the clustering accuracy; on the other hand, the collaborative idea is introduced into the possibilistic fuzzy clustering based on information bottleneck, which can form an accurate and complete representation of the data organization structure based on make full use of the correlation between different feature subsets for collaborative clustering. To examine the clustering performance of this algorithm, five algorithms were selected for comparison experiments on several datasets. Experimental results show that the proposed algorithm outperforms the comparison algorithms in terms of clustering accuracy and collaborative validity.
“…The total number of Samples (30) [14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29]} P2{[0, 2,4,6,8,10,12,14,16,18,20,22,24,26,28], [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29]} P3{[0, 1,2,3,4,5,6,7,8,…”
Section: All Predictions Correctly Number Of Samplesmentioning
confidence: 99%
“…In order to make clustering widely available in more fields, it can be applied to large-scale group decision-making [8,9]. Existing clustering algorithms mainly include hard clustering [10,11] and fuzzy clustering [12][13][14]. The former has only two membership degrees, 0 and 1, that is, each data object is strictly divided into a certain cluster; The mem-bership of the latter can have any values within the interval [0,1], that is, a data object can be classified into multiple clusters with different membership.…”
In fuzzy clustering algorithms, the possibilistic fuzzy clustering algorithm has been widely used in many fields. However, the traditional Euclidean distance cannot measure the similarity between samples well in high-dimensional data. Moreover, if there is an overlap between clusters or a strong correlation between features, clustering accuracy will be easily affected. To overcome the above problems, a collaborative possibilistic fuzzy clustering algorithm based on information bottleneck is proposed in this paper. This algorithm retains the advantages of the original algorithm, on the one hand, using mutual information loss as the similarity measure instead of Euclidean distance, which is conducive to reducing subjective errors caused by arbitrary choices of similarity measures and improving the clustering accuracy; on the other hand, the collaborative idea is introduced into the possibilistic fuzzy clustering based on information bottleneck, which can form an accurate and complete representation of the data organization structure based on make full use of the correlation between different feature subsets for collaborative clustering. To examine the clustering performance of this algorithm, five algorithms were selected for comparison experiments on several datasets. Experimental results show that the proposed algorithm outperforms the comparison algorithms in terms of clustering accuracy and collaborative validity.
“…Another different strategy is the nonparametric strategy which includes multiple techniques; one example is Random Forest [17]. Many researchers have used variable importance measurement strategies and applied them to enhance the classifier's performance, such as [18], naive Bayes text classifiers [19,20], the fuzzy clustering method, and feature weighting used for the neural network [21,22], with SVMs [23]. Also, feature weighting has been used as a feature selection strategy to know the influence of features on results and then exclude irrelevant, redundant features [24][25][26], as well as the information gain attribute [27].…”
In the realm of data analysis and machine learning, achieving an optimal balance of feature importance, known as feature weighting, plays a pivotal role, especially when considering the nuanced interplay between the symmetry of data distribution and the need to assign differential weights to individual features. Also, avoiding the dominance of large-scale traits is essential in data preparation. This step makes choosing an effective normalization approach one of the most challenging aspects of machine learning. In addition to normalization, feature weighting is another strategy to deal with the importance of the different features. One of the strategies to measure the dependency of features is the correlation coefficient. The correlation between features shows the relationship strength between the features. The integration of the normalization method with feature weighting in data transformation for classification has not been extensively studied. The goal is to improve the accuracy of classification methods by striking a balance between the normalization step and assigning greater importance to features with a strong relation to the class feature. To achieve this, we combine Min–Max normalization and weight the features by increasing their values based on their correlation coefficients with the class feature. This paper presents a proposed Correlation Coefficient with Min–Max Weighted (CCMMW) approach. The data being normalized depends on their correlation with the class feature. Logistic regression, support vector machine, k-nearest neighbor, neural network, and naive Bayesian classifiers were used to evaluate the proposed method. Twenty UCI Machine Learning Repository and Kaggle datasets with numerical values were also used in this study. The empirical results showed that the proposed CCMMW significantly improves the classification performance through support vector machine, logistic regression, and neural network classifiers in most datasets.
“…Although fuzzy clustering can effectively deal with highdimensional feature data through feature reduction [7][8][9][10][11][12][13], it is still difficult to process large-scale data, especially streaming data. Previously, in order to realize large-scale data clustering [14][15][16][17][18], Hore et al [19,20] proposed two incremental algorithms, named SPFCM (Single-Pass Fuzzy C-Means) and OFCM (Online Fuzzy C-Means), based on single-pass and online clustering strategies, respectively.…”
In the era of big data, more and more datasets are gradually beyond the application scope of traditional clustering algorithms because of their large scale and high dimensions. In order to break through the limitations, incremental mechanism and feature reduction have become two indispensable parts of current clustering algorithms. Combined with single-pass and online incremental strategies, respectively, we propose two incremental fuzzy clustering algorithms based on feature reduction. The first uses the Weighted Feature Reduction Fuzzy C-Means (WFRFCM) clustering algorithm to process each chunk in turn and combines the clustering results of the previous chunk into the latter chunk for common calculation. The second uses the WFRFCM algorithm for each chunk to cluster at the same time, and the clustering results of each chunk are combined and calculated again. In order to investigate the clustering performance of these two algorithms, six datasets were selected for comparative experiments. Experimental results showed that these two algorithms could select high-quality features based on feature reduction and process large-scale data by introducing the incremental strategy. The combination of the two phases can not only ensure the clustering efficiency but also keep higher clustering accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.