Classification problems with uneven class distributions present several difficulties during the training as well as during the evaluation process of classifiers. A classification problem with such characteristics has resulted from a data-mining project where the objective was to predict customer insolvency. Using the dataset from the customer insolvency problem we study several alternative methodologies which have been reported to better suit the specific characteristics of this type of problems. Three different but equally important directions are examined; (a) the performance measures that should be used for problems in this domain, (b) the class distributions that should be used for the training data sets, (c) the classification algorithms to be used. The final evaluation of the resulting classifiers is based on a study of the economic impact of classification results. This study concludes to a framework that provides the "best" classifiers, identifies the performance measures that should be used as the decision criterion and suggests the "best" class distribution based on the value of the relative gain from correct classification in the positive class.This framework has been applied in the customer insolvency problem, but it is claimed that it can be applied to many similar problems with uneven class distributions that almost always require a multi-objective evaluation proces.
The High School Timetabling Problem is amongst the most widely used timetabling problems. This problem has varying structures in different high schools even within the same country or educational system. Due to lack of standard benchmarks and data formats this problem has been studied less than other timetabling problems in the literature. In this paper we describe the High School Timetabling Problem in several countries in order to find a common set of constraints and objectives. Our main goal is to provide exchangeable benchmarks for this problem. To achieve this we propose a standard data format suitable for different countries and educational systems, defined by an XML schema. The schema and datasets are available online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.