Abstract-This paper introduces two statistical outlier detection approaches by classes. Experiments on binary and multi-class classification problems reveal that the partial removal of outliers improves significantly one or two performance measures for C4.S and I-nearest neighbour classifiers. Also, a taxonomy of problems according to the amount of outliers is proposed.
This paper combines feature selection methods with a two-stage evolutionary classifier based on product unit neural networks. The enhanced methodology has been tried out with four filters using 18 data sets that report test error rates about 20 % or above with reference classifiers such as C4.5 or 1-NN. The proposal has also been evaluated in a liver-transplantation real-world problem with serious troubles in the data distribution and classifiers get low performance. The study includes an overall empirical comparison between the models obtained with and without feature selection using different kind of neural networks, like RBF, MLP and other state-of-the-art classifiers. Statistical tests show that our proposal significantly improves the test accuracy of the previous models. The reduction percentage in the number of inputs is, on average, above 55 %, thus a greater efficiency is achieved.
Abstract. Digital forensics research includes several stages. Once we have collected the data the last goal is to obtain a model in order to predict the output with unseen data. We focus on supervised machine learning techniques. This chapter performs an experimental study on a forensics data task for multi-class classification including several types of methods such as decision trees, bayes classifiers, based on rules, artificial neural networks and based on nearest neighbors. The classifiers have been evaluated with two performance measures: accuracy and Cohen's kappa. The followed experimental design has been a 4-fold cross validation with thirty repetitions for non-deterministic algorithms in order to obtain reliable results, averaging the results from 120 runs. A statistical analysis has been conducted in order to compare each pair of algorithms by means of t-tests using both the accuracy and Cohen's kappa metrics.
Abstract. This paper introduces a methodology that improves the accuracy of a two-stage algorithm in evolutionary product unit neural networks for classification tasks by means of feature selection. A couple of filters have been taken into consideration to try out the proposal. The experimentation has been carried out on seven data sets from the UCI repository that report test mean accuracy error rates about twenty percent or above with reference classifiers such as C4.5 or 1-NN. The study includes an overall empirical comparison between the models obtained with and without feature selection. Also several classifiers have been tested in order to illustrate the performance of the different filters considered. The results have been contrasted with nonparametric statistical tests and show that our proposal significantly improves the test accuracy of the previous models for the considered data sets. Moreover, the current proposal is much more efficient than a previous methodology developed by us; lastly, the reduction percentage in the number of inputs is above a fifty five, on average.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.