Due to the various data sets and the cumbersome and diverse data types, there must be many redundant attributes in them, which greatly increases the classification time in the background of rough set theory. In this paper, we improve the attribute reduction algorithm by information gain ratio. The data sets obtained after the attribute reduction of this method are used for classification, and the data sets are directly used for classification and comparison with other common classification methods. Experimental data verify that the improved algorithm in this article is effective. It can improve the classification speed greatly and shorten the time spent.
In view of the current increasing status of breast cancer patients, to enable patients to predict whether they have breast cancer by themselves from physical examination data, this paper proposes a method for mining breast cancer association rules based on fuzzy rough sets. The method proposed in this paper first analyzes the attributes in the traditional blood data, then applies the attribute reduction of the fuzzy rough set, deletes the attributes irrelevant to breast cancer, and uses the Apriori algorithm in data mining to obtain the frequent items in the remaining attributes Set, apply low support and high confidence to extract many practical, strong association rules. Specific examples verify this method. The experimental results show that this method can dig out more and higher-quality rules compared with traditional algorithms. At the same time, these extracted rules are highly effective reference values in diagnosing and preventing breast cancer.
In order to objectively evaluate the input-output level of distribution network projects and improve the investment benefit of the distribution network, this paper considers the multi-dimensional driving factors, constructs a comprehensive and systematic investment benefit evaluation system, and proposes a cluster evaluation model of the distribution network investment benefit. In this model, the genetic algorithm and particle swarm optimization algorithm are applied to the K-means clustering model to realize collaborative clustering and improve the clustering effect. A heuristic algorithm is added to enhance the clustering efficiency, and the effectiveness and accuracy of the improved clustering algorithm proposed in this paper are verified by experiments. Finally, it is applied to an example to evaluate the investment benefit by clustering. Then the comprehensive post-evaluation of electric power investment projects is carried out by using the solved results, so as to realize the lean management and control of investment in distribution network construction and transformation of power grid companies.
Feature selection has been shown to be a highly valuable strategy in data mining, pattern recognition, and machine learning. However, the majority of proposed feature selection methods do not account for feature interaction while calculating feature correlations. Interactive features are those features that have less individual relevance with the class, but can provide more joint information for the class when combined with other features. Inspired by it, a novel feature selection algorithm considering feature relevance, redundancy, and interaction in neighborhood rough set is proposed. First of all, a new method of information measurement called neighborhood symmetric uncertainty is proposed, to measure what proportion data a feature contains regarding category label. Afterwards, a new objective evaluation function of the interactive selection is developed. Then a novel feature selection algorithm named (NSUNCMI) based on measuring feature correlation, redundancy and interactivity is proposed. The results on the nine universe datasets and five representative feature selection algorithms indicate that NSUNCMI reduces the dimensionality of feature space efficiently and offers the best average classification accuracy.
In actual association rule mining, data sets collected from enterprises or real life often have some problems, such as a large amount of data missing or data redundancy, which greatly increases the spatial complexity of mining association rules and makes mining efficiency inefficient. Not only that, some actual data set contain hundreds or even more attributes. Not only does it take too long to mine association rules, but there are too many association rules obtained, making it difficult for users to distinguish which is more valuable information in practical applications. It is difficult to apply these data to actual enterprises to get greater benefits. In response to these problems, this paper proposes an association rule algorithm based on the FP-Growth association rule algorithm of information gain ratio attribute reduction to extract more valuable information and improve the efficiency of association rule mining. Finally, through experiments and comparisons, it is verified that the algorithm proposed in this paper can effectively mine the association rule information of multi-attribute data sets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.