Analysis of the consistency of a mixed integer programming-based multi-category constrained discriminant model

Brooks, J. Paul; Lee, Eva K.

doi:10.1007/s10479-008-0424-0

Cited by 38 publications

(20 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Traditional SVM is tuned for the same parameter values except for the loss function. Classification trees are tuned for the split criterion (Gini or information) and k-nearest neighbor is tuned for k (1,3,4,7,9) and distance function (L 1 and L 2 ). Random forests and logistic regression are used with default settings for all tests.…”

Section: Comparisons With Other Classifiersmentioning

confidence: 99%

“…Steinwart [36] proves that SVM with the traditional hinge loss is universally consistent. Brooks and Lee [7] prove that an integer-programming based method for constrained discrimination, a generalization of the classification problem, is consistent. This paper presents new integer programming formulations for SVM with the ramp loss and hard margin loss that accommodate the use of nonlinear kernel functions and the quadratic margin term.…”

Section: Introductionmentioning

confidence: 95%

See 1 more Smart Citation

Support Vector Machines with the Ramp Loss and the Hard Margin Loss

Brooks

2011

Operations Research

Self Cite

124

137

View full text Add to dashboard Cite

In the interest of deriving classifiers that are robust to outlier observations, we present integer programming formulations of Vapnik's support vector machine (SVM) with the ramp loss and hard margin loss. The ramp loss allows a maximum error of 2 for each training observation, while the hard margin loss calculates error by counting the number of training observations that are misclassified outside of the margin. SVM with these loss functions is shown to be a consistent estimator when used with certain kernel functions. Based on results on simulated and real-world data, we conclude that SVM with the ramp loss is preferred to SVM with the hard margin loss. Data sets for which robust formulations of SVM perform comparatively better than the traditional formulation are characterized with theoretical and empirical justification. Solution methods are presented that reduce computation time over industry-standard integer programming solvers alone.

show abstract

Section: Comparisons With Other Classifiersmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 95%

Support Vector Machines with the Ramp Loss and the Hard Margin Loss

Brooks

2011

Operations Research

Self Cite

124

137

View full text Add to dashboard Cite

show abstract

“…The DAMIP classification model, a general-purpose, optimization-based, predictive modeling framework, has proven to be a very powerful supervised learning classification approach in predicting various biomedical and clinical phenomena [22][23][24] due to the universal consistency of the resulting classification rules and their ability to classify with high accuracy even among small training sets. 25 Fifty percent of the practices were randomly selected as the training set, 7 practices in group 1 and 9 in group 2. The DAMIP model was then applied to this training set to develop the prediction rule and to obtain an unbiased estimate of correct classification.…”

Section: Classification Analysesmentioning

confidence: 99%

Practice Characteristics That Influence Nonurgent Pediatric Emergency Department Utilization

Sturm

Hirsh

Lee

et al. 2010

Academic Pediatrics

Self Cite

View full text Add to dashboard Cite

“…6 and 7, surrogate functions provide a poor trade-off between accuracy and sparsity. Convex surrogate loss functions, for instance, produce models that do not attain the best learning-theoretic guarantee on predictive accuracy and are not robust to outliers (Li and Lin 2007;Brooks and Lee 2010;Nguyen and Sanner 2013). Similarly, 1 -regularization is only guaranteed to recover the correct sparse solution (i.e., the one that minimizes the 0 -norm) under restrictive conditions that are rarely satisfied in practice (Zhao and Bin 2007;Liu and Zhang 2009).…”

Section: Sparse Linear Classification Modelsmentioning

confidence: 99%

Supersparse linear integer models for optimized medical scoring systems

2015

View full text Add to dashboard Cite

Scoring systems are linear classification models that only require users to add, subtract and multiply a few small numbers in order to make a prediction. These models are in widespread use by the medical community, but are difficult to learn from data because they need to be accurate and sparse, have coprime integer coefficients, and satisfy multiple operational constraints. We present a new method for creating data-driven scoring systems called a Supersparse Linear Integer Model (SLIM). SLIM scoring systems are built by using an integer programming problem that directly encodes measures of accuracy (the 0-1 loss) and sparsity (the 0 -seminorm) while restricting coefficients to coprime integers. SLIM can seamlessly incorporate a wide range of operational constraints related to accuracy and sparsity, and can produce acceptable models without parameter tuning because of the direct control provided over these quantities. We provide bounds on the testing and training accuracy of SLIM scoring systems, and present a new data reduction technique that can improve scalability by eliminating a portion of the training data beforehand. Our paper includes results from a collaboration with the Massachusetts General Hospital Sleep Laboratory, where SLIM is being used to create a highly tailored scoring system for sleep apnea screening. Electronic supplementary material The online version of this article

show abstract

Analysis of the consistency of a mixed integer programming-based multi-category constrained discriminant model

Cited by 38 publications

References 23 publications

Support Vector Machines with the Ramp Loss and the Hard Margin Loss

Support Vector Machines with the Ramp Loss and the Hard Margin Loss

Practice Characteristics That Influence Nonurgent Pediatric Emergency Department Utilization

Supersparse linear integer models for optimized medical scoring systems

Contact Info

Product

Resources

About