Abstract:A single linear program is proposed for discriminating between the elements of k disjoint point sets in the n-dimensional real space Rn. When the conical hulls of the k sets are (k-1)-point disjoint in R"+', a k-piece piecewise-linear surface generated by the linear program completely sepwates the k sets. This improves on a previous linear programming approach which required that each set be linearly separable from the remaining k-1 sets. When the conical hulls of the k sets are not (k-1)-point d~~sjoint. the … Show more
“…We comment that the original bound in [192] contained an extra factor of log N multiplying VCdim(F) in (15).…”
Section: Theorem 2 ([192]mentioning
confidence: 99%
“…In this section we present a brief survey of several extensions and generalizations, although many others exist, e.g. [76,158,62,18,31,3,155,15,44,52,82,164,66,38,137,147,16,80,185,100,14].…”
Section: Extensionsmentioning
confidence: 99%
“…One uses the example weights computed at each iteration to determine which examples are highly influential and hard to classify. Assuming that the hard examples are "noisy examples", the algorithm chooses the mistrust parameter at iteration t, ζ (t) n , as the amount by which the example (x n , y n ) influenced the decision in previous iterations: 15 Note that we use w to mark the parameters of the hyperplane in feature space and α to denote the coefficients generated by the algorithm (see Section 4.1).…”
Section: Reducing the Influence Of Examplesmentioning
Abstract. We provide an introduction to theoretical and practical aspects of Boosting and Ensemble learning, providing a useful reference for researchers in the field of Boosting as well as for those seeking to enter this fascinating area of research. We begin with a short background concerning the necessary learning theoretical foundations of weak learners and their linear combinations. We then point out the useful connection between Boosting and the Theory of Optimization, which facilitates the understanding of Boosting and later on enables us to move on to new Boosting algorithms, applicable to a broad spectrum of problems. In order to increase the relevance of the paper to practitioners, we have added remarks, pseudo code, "tricks of the trade", and algorithmic considerations where appropriate. Finally, we illustrate the usefulness of Boosting algorithms by giving an overview of some existing applications. The main ideas are illustrated on the problem of binary classification, although several extensions are discussed.
“…We comment that the original bound in [192] contained an extra factor of log N multiplying VCdim(F) in (15).…”
Section: Theorem 2 ([192]mentioning
confidence: 99%
“…In this section we present a brief survey of several extensions and generalizations, although many others exist, e.g. [76,158,62,18,31,3,155,15,44,52,82,164,66,38,137,147,16,80,185,100,14].…”
Section: Extensionsmentioning
confidence: 99%
“…One uses the example weights computed at each iteration to determine which examples are highly influential and hard to classify. Assuming that the hard examples are "noisy examples", the algorithm chooses the mistrust parameter at iteration t, ζ (t) n , as the amount by which the example (x n , y n ) influenced the decision in previous iterations: 15 Note that we use w to mark the parameters of the hyperplane in feature space and α to denote the coefficients generated by the algorithm (see Section 4.1).…”
Section: Reducing the Influence Of Examplesmentioning
Abstract. We provide an introduction to theoretical and practical aspects of Boosting and Ensemble learning, providing a useful reference for researchers in the field of Boosting as well as for those seeking to enter this fascinating area of research. We begin with a short background concerning the necessary learning theoretical foundations of weak learners and their linear combinations. We then point out the useful connection between Boosting and the Theory of Optimization, which facilitates the understanding of Boosting and later on enables us to move on to new Boosting algorithms, applicable to a broad spectrum of problems. In order to increase the relevance of the paper to practitioners, we have added remarks, pseudo code, "tricks of the trade", and algorithmic considerations where appropriate. Finally, we illustrate the usefulness of Boosting algorithms by giving an overview of some existing applications. The main ideas are illustrated on the problem of binary classification, although several extensions are discussed.
“…Perceptron Decision Trees (PDT) have been introduced by a number of authors under different names (Mangasarian et al, 1990;Bennett & Mangasarian, 1992, 1994a, 1994bBreiman et al, 1984;Broadley & Utgoff, 1995;Utgoff, 1989;Murthy, Kasif, & Salzberg, 1994). They are decision trees in which each internal node is associated with a hyperplane in general position in the input space.…”
Section: Introductionmentioning
confidence: 99%
“…They are decision trees in which each internal node is associated with a hyperplane in general position in the input space. They have been used in many real-world pattern classification tasks with good results (Bennett & Mangasarian 1994a;Murthy, Kasif, & Salzberg, 1994;Bennett, Wu, & Auslender, 1998). Given their high flexibility, a feature that they share with more standard decision trees such as the ones produced by C4.5 (Quinlan, 1993), they tend to overfit the data if their complexity is not somehow kept under control.…”
Abstract. Capacity control in perceptron decision trees is typically performed by controlling their size. We prove that other quantities can be as relevant to reduce their flexibility and combat overfitting. In particular, we provide an upper bound on the generalization error which depends both on the size of the tree and on the margin of the decision nodes. So enlarging the margin in perceptron decision trees will reduce the upper bound on generalization error. Based on this analysis, we introduce three new algorithms, which can induce large margin perceptron decision trees. To assess the effect of the large margin bias, OC1 (Journal of Artificial Intelligence Research, 1994, 2, 1-32.) of Murthy, Kasif, and Salzberg, a well-known system for inducing perceptron decision trees, is used as the baseline algorithm. An extensive experimental study on real world data showed that all three new algorithms perform better or at least not significantly worse than OC1 on almost every dataset with only one exception. OC1 performed worse than the best margin-based method on every dataset.
Decision Trees are one of most used models for prediction tasks in machine learning, data mining, and statistics. The representation language of a standard decision tree uses splitting tests based on a single input variable and constants at tree leaves. The most recent advances in learning decision trees from data involve the use of splitting tests based on combinations of the input variables and functions at tree leaves. These types of models are denominated
functional trees
. In this article, we present the issues and actual trends in learning functional trees from data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.