Naive Bayes is among the simplest probabilistic classifiers. It often performs surprisingly well in many real world applications, despite the strong assumption that all features are conditionally independent given the class. In the learning process of this classifier with the known structure, class probabilities and conditional probabilities are calculated using training data, and then values of these probabilities are used to classify new observations. In this paper, we introduce three novel optimization models for the naive Bayes classifier where both class probabilities and conditional probabilities are considered as variables. The values of these variables are found by solving the corresponding optimization problems. Numerical experiments are conducted on several real world binary classification data sets, where continuous features are discretized by applying three different methods. The performances of these models are compared with the naive Bayes classifier, tree augmented naive Bayes, the SVM, C4.5 and the nearest neighbor classifier. The obtained results demonstrate that the proposed models can significantly improve the performance of the naive Bayes classifier, yet at the same time maintain its simple structure.
The proximal gradient algorithm has been popularly used for convex optimization. Recently, it has also been extended for nonconvex problems, and the current state-of-the-art is the nonmonotone accelerated proximal gradient algorithm. However, it typically requires two exact proximal steps in each iteration, and can be inefficient when the proximal step is expensive. In this paper, we propose an efficient proximal gradient algorithm that requires only one inexact (and thus less expensive) proximal step in each iteration. Convergence to a critical point is still guaranteed and has a O(1/k) convergence rate, which is the best rate for nonconvex problems with first-order methods. Experiments on a number of problems demonstrate that the proposed algorithm has comparable performance as the state-of-the-art, but is much faster.
Problems of data classification can be studied in the framework of regularization theory as ill-posed problems. In this framework, loss functions play an important role in the application of regularization theory to classification. In this paper, we review some important convex loss functions, including hinge loss, square loss, modified square loss, exponential loss, logistic regression loss, as well as some non-convex loss functions, such as sigmoid loss, ϕ-loss, ramp loss, normalized sigmoid loss, and the loss function of 2 layer neural network. Based on the analysis of these loss functions, we propose a new differentiable nonconvex loss function, called smoothed 0-1 loss function, which is a natural approximation of the 0-1 loss function. To compare the performance of different loss functions, we propose two binary classification algorithms for binary classification, one for convex loss functions, the other for non-convex loss functions. A set of experiments are launched on several binary data sets from the UCI repository. The results show that the proposed smoothed 0-1 loss function is robust, especially for those noisy data sets with many outliers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.