“…However, as discussed in [8,14], binary classification of instances as positive or negative is sometimes too strict and can result in high misclassification costs. Three-way classification can also leave an email unclassified in case of low confidence in classification.…”
Section: Multiobjective Problem Formulationmentioning
Abstract. This paper presents a 4-objective evolutionary multiobjective optimization study for optimizing the error rates (false positives, false negatives), reliability, and complexity of binary classifiers. The example taken is the email anti-spam filtering problem.The two major goals of the optimization is to minimize the error rates that is the false negative rate and the false positive rate. Our approach discusses three-way classification, that is the binary classifier can also not classify an instance in cases where there is not enough evidence to assign the instance to one of the two classes. In this case the instance is marked as suspicious but still presented to the user. The number of unclassified (suspicious) instances should be minimized, as long as this does not lead to errors. This will be termed the coverage objective. The set (ensemble) of rules needed for the anti-spam filter to operate in optimal conditions is addressed as a fourth objective. All objectives stated above are in general conflicting with each other and that is why we address the problem as a 4-objective (quadcriteria) optimization problem. We assess the performance of a set of state-of-the-art evolutionary multiobjective optimization algorithms. These are NSGA-II, SPEA2, and the hypervolume indicator-based SMS-EMOA. Focusing on the anti-spam filter optimization, statistical comparisons on algorithm performance are provided on several benchmarks and a range of performance indicators. Moreover, the resulting 4-D Pareto hyper-surface is discussed in the context of binary classifier optimization.
“…However, as discussed in [8,14], binary classification of instances as positive or negative is sometimes too strict and can result in high misclassification costs. Three-way classification can also leave an email unclassified in case of low confidence in classification.…”
Section: Multiobjective Problem Formulationmentioning
Abstract. This paper presents a 4-objective evolutionary multiobjective optimization study for optimizing the error rates (false positives, false negatives), reliability, and complexity of binary classifiers. The example taken is the email anti-spam filtering problem.The two major goals of the optimization is to minimize the error rates that is the false negative rate and the false positive rate. Our approach discusses three-way classification, that is the binary classifier can also not classify an instance in cases where there is not enough evidence to assign the instance to one of the two classes. In this case the instance is marked as suspicious but still presented to the user. The number of unclassified (suspicious) instances should be minimized, as long as this does not lead to errors. This will be termed the coverage objective. The set (ensemble) of rules needed for the anti-spam filter to operate in optimal conditions is addressed as a fourth objective. All objectives stated above are in general conflicting with each other and that is why we address the problem as a 4-objective (quadcriteria) optimization problem. We assess the performance of a set of state-of-the-art evolutionary multiobjective optimization algorithms. These are NSGA-II, SPEA2, and the hypervolume indicator-based SMS-EMOA. Focusing on the anti-spam filter optimization, statistical comparisons on algorithm performance are provided on several benchmarks and a range of performance indicators. Moreover, the resulting 4-D Pareto hyper-surface is discussed in the context of binary classifier optimization.
“…The rough set (RS) theory proposed by the Polish scientist Z. Pawlak [16][17][18][19] is another new data analysis method applied to the uncertain information mathematical tool in addition to the probability theory and fuzzy set. A large number of historical data from the industrial production process may be ambiguous, uncertain and incomplete.…”
Section: Attribute Reduction Based On Rough Set (Rs) Theorymentioning
Abstract:In order to realize the fault diagnosis of the polyvinyl chloride (PVC) polymerization kettle reactor, a rough set (RS)-probabilistic neural networks (PNN) fault diagnosis strategy is proposed. Firstly, through analysing the technique of the PVC polymerization reactor, the mapping between the polymerization process data and the fault modes is established. Then, the rough set theory is used to tackle the input vector of PNN so as to reduce the network dimensionality and improve the training speed of PNN. Shuffled frog leaping algorithm (SFLA) is adopted to optimize the smoothing factor of PNN. The fault pattern classification of polymerization kettle equipment is to realize the nonlinear mapping from symptom set to fault set according to the given symptom set. Finally, the fault diagnosis simulation experiments are conducted by combining with the industrial on-site historical datum of polymerization kettle, and the results show that the RS-PNN fault diagnosis strategy is effective.
“…Through the three decades of the development, rough set has been demonstrated to be useful in knowledge acquisition 5,10 , pattern recognition 3,7,23 , machine learning 6,8,9,11,26 , decision support 21,44,46,57 and so on.…”
A simple multigranulation rough set approach is to approximate the target through a family of binary relations. Optimistic and pessimistic multigranulation rough sets are two typical examples of such approach. However, these two multigranulation rough sets do not take frequencies of occurrences of containments or intersections into account. To solve such problem, by the motivation of the multiset, the model of the multiple multigranulation rough set is proposed, in which both lower and upper approximations are multisets. Such two multisets are useful when counting frequencies of occurrences such that objects belong to lower or upper approximations with a family of binary relations. Furthermore, not only the concept of approximate distribution reduct is introduced into multiple multigranulation rough set, but also a heuristic algorithm is presented for computing reduct. Finally, multiple multigranulation rough set approach is tested on eight UCI (University of California-Irvine) data sets. Experimental results show: 1. the approximate quality based on multiple multigranulation rough set is between approximate qualities based on optimistic and pessimistic multigranulation rough sets; 2. by comparing with optimistic and pessimistic multigranulation rough sets, multiple multigranulation rough set needs more attributes to form a reduct.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.