Ozone and PM 10 constitute the major concern for air quality of Milan. This paper addresses the problem of the prediction of such two pollutants, using to this end several statistical approaches. In particular, feed-forward neural networks (FFNNs), currently recognized as state-of-the-art approach for statistical prediction of air quality, are compared with two alternative approaches derived from machine learning: pruned neural networks (PNNs) and lazy learning (LL). PNNs constitute a parameterparsimonious approach, based on the removal of redundant parameters from fully connected neural networks; LL, on the other hand, is a local linear prediction algorithm, which performs a local learning procedure each time a prediction is required. All the three approaches are tested in the prediction of ozone and PM 10 ; predictors are trained to return at 9 a.m. the concentration estimated for the current day.No strong differences are found between the forecast accuracies of the different models; nevertheless, LL provides the best performances on indicators related to average goodness of the prediction (correlation, mean absolute error, etc.), while PNNs are superior to the other approaches in detecting of the exceedances of alarm and attention thresholds. In some cases, datadeseasonalization is found to improve the prediction accuracy of the models.Finally, some striking features of lazy learning deserve consideration: the LL predictor can be quickly designed, and, thanks to the simplicity of the local linear regressors, it both gets rid of overfitting problems and can be readily interpreted; moreover, it can be also easily kept up-to-date.
Predictions made by imprecise-probability models are often indeterminate (that is, setvalued). Measuring the quality of an indeterminate prediction by a single number is important to fairly compare different models, but a principled approach to this problem is currently missing. In this paper we derive, from a set of assumptions, a metric to evaluate the predictions of credal classifiers. These are supervised learning models that issue setvalued predictions. The metric turns out to be made of an objective component, and another that is related to the decision-maker's degree of risk aversion to the variability of predictions. We discuss when the measure can be rendered independent of such a degree, and provide insights as to how the comparison of classifiers based on the new measure changes with the number of predictions to be made. Finally, we make extensive empirical tests of credal, as well as precise, classifiers by using the new metric. This shows the practical usefulness of the metric, while yielding a first insightful and extensive comparison of credal classifiers.
Failure management plays a role of capital importance in optical networks to avoid service disruptions and to satisfy customers' service level agreements. Machine Learning (ML) promises to revolutionize the (mostly manual and humandriven) approaches in which failure management in optical networks has been traditionally managed, by introducing automated methods for failure prediction, detection, localization and identification. This tutorial provides a gentle introduction to some ML techniques that have been recently applied in the field of optical-network failure management. It then introduces a taxonomy to classify failure-management tasks and discusses possible applications of ML for these failure management tasks. Finally, for a reader interested in more implementative details, we provide a step-by-step description of how to solve a representative example of a practical failure-management task.
Usually one compares the accuracy of two competing classifiers via null hypothesis significance tests (nhst). Yet the nhst tests suffer from important shortcomings, which can be overcome by switching to Bayesian hypothesis testing. We propose a Bayesian hierarchical model which jointly analyzes the cross-validation results obtained by two classifiers on multiple data sets. It returns the posterior probability of the accuracies of the two classifiers being practically equivalent or significantly different. A further strength of the hierarchical model is that, by jointly analyzing the results obtained on all data sets, it reduces the estimation error compared to the usual approach of averaging the cross-validation results obtained on a given data set. G. Corani, A. Benavoli F. Mangili and M. Zaffalon are with Istituto Dalle Molle di studi sull'Intelligenza Artificiale (IDSIA), Manno, Switzerland J. Demšar is with
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.