The task of quantification consists in providing an aggregate estimation (e.g. the class distribution in a classification problem) for unseen test sets, applying a model that is trained using a training set with a different data distribution. Several real-world applications demand this kind of methods that do not require predictions for individual examples and just focus on obtaining accurate estimates at an aggregate level. During the past few years, several quantification methods have been proposed from different perspectives and with different goals. This paper presents a unified review of the main approaches with the aim of serving as an introductory tutorial for newcomers in the field.
The study of marine plankton data is vital to monitor the health of the world’s oceans. In recent decades, automatic plankton recognition systems have proved useful to address the vast amount of data collected by specially engineered in situ digital imaging systems. At the beginning, these systems were developed and put into operation using traditional automatic classification techniques, which were fed with hand-designed local image descriptors (such as Fourier features), obtaining quite successful results. In the past few years, there have been many advances in the computer vision community with the rebirth of neural networks. In this paper, we leverage how descriptors computed using convolutional neural networks trained with out-of-domain data are useful to replace hand-designed descriptors in the task of estimating the prevalence of each plankton class in a water sample. To achieve this goal, we have designed a broad set of experiments that show how effective these deep features are when working in combination with state-of-the-art quantification algorithms.
Ensembles are among the most effective and successful methods for almost all supervised tasks. Not long ago, an ensemble approach has been proposed for quantification learning. The idea of such method is to exploit the prior knowledge about quantification tasks, building ensembles in which diversity is achieved by training each model with a different distribution. These training samples are generated taking into account the expected drift in class distribution. This paper extends this method proposing three new quantifier selection criteria particularly devised for quantification problems, where two of them are defined for dynamic ensemble selection. The experiments demonstrate that, in many cases, these selection functions outperform straightforward approaches, like averaging all models and using quantification accuracy to prune the ensemble. Moreover, the results show that performance heavily depends on the combination of the base quantification algorithm and the selection measure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.