The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50 algorithms including approaches that are direct adaptations of accuracy based methods, use genetic algorithms, use anytime methods and utilize boosting and bagging. The survey brings together these different studies and novel approaches to cost-sensitive decision tree learning, provides a useful taxonomy, a historical timeline of how the field has developed and should provide a useful reference point for future research in this field.
Decision tree induction is a widely used technique for learning from data, which first emerged in the 1980s. In recent years, several authors have noted that in practice, accuracy alone is not adequate, and it has become increasingly important to take into consideration the cost of misclassifying the data. Several authors have developed techniques to induce cost-sensitive decision trees. There are many studies that include pair-wise comparisons of algorithms, but the comparison including many methods has not been conducted in earlier work. This paper aims to remedy this situation by investigating different cost-sensitive decision tree induction algorithms. A survey has identified 30 cost-sensitive decision tree algorithms, which can be organized into 10 categories. A representative sample of these algorithms has been implemented and an empirical evaluation has been carried. In addition, an accuracy-based look-ahead algorithm has been extended to a new cost-sensitive look-ahead algorithm and also evaluated. The main outcome of the evaluation is that an algorithm based on genetic algorithms, known as Inexpensive Classification with Expensive Tests, performed better over all the range of experiments thus showing that to make a decision tree cost-sensitive, it is better to include all the different types of costs, that is, cost of obtaining the data and misclassification costs, in the induction of the decision tree.
fa cil ity and fre quency of their meas urement, this "un pleas ant state" could be spe ciously la belled as anxi ety or dys pho ria. The per son al ity theo ries of Hor ney (18) and Rogers (19) are ger mane. Hor ney's
This paper develops a new algorithm for inducing cost-sensitive decision trees that is inspired by the multi-armed bandit problem, in which a player in a casino has to decide which slot machine (bandit) from a selection of slot machines is likely to pay out the most. Game Theory proposes a solution to this multi-armed bandit problem by using a process of exploration and exploitation in which reward is maximized. This paper utilizes these concepts to develop a new algorithm by viewing the rewards as a reduction in costs, and utilizing the exploration and exploitation techniques so that a compromise between decisions based on accuracy and decisions based on costs can be found. The algorithm employs the notion of lever pulls in the multi-armed bandit game to select the attributes during decision tree induction, using a lookahead methodology to explore potential attributes and exploit the attributes which maximizes the reward. The new algorithm is evaluated on fifteen datasets and compared to six wellknown algorithms J48, EG2, MetaCost, AdaCostM1, ICET and ACT. The results obtained show that the new multi-armed based algorithm can produce more cost-effective trees without compromising accuracy. The paper also includes a critical appraisal of the limitations of the new algorithm and proposes avenues for further research.
The advent of price and product comparison sites now makes it even more important to retain customers and identify those that might be at risk of leaving. The use of data mining methods has been widely advocated for predicting customer churn. This paper presents two case studies that utilize decision tree learning methods to develop models for predicting churn for a software company. The first case study aims to predict churn for organizations which currently have an ongoing project, to determine if organizations are likely to continue with other projects. While the second case study presents a more traditional example, where the aim is to predict organizations likely to cease being a subscriber to a service. The case studies include presentation of the accuracy of the models using a standard methodology as well as comparing the results with what happened in practice. Both case studies show the significant savings that can be made, plus potential increase in revenue by using decision tree learning for churn analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.