The task of quantification consists in providing an aggregate estimation (e.g. the class distribution in a classification problem) for unseen test sets, applying a model that is trained using a training set with a different data distribution. Several real-world applications demand this kind of methods that do not require predictions for individual examples and just focus on obtaining accurate estimates at an aggregate level. During the past few years, several quantification methods have been proposed from different perspectives and with different goals. This paper presents a unified review of the main approaches with the aim of serving as an introductory tutorial for newcomers in the field.
In recent decades, the automatic study and analysis of plankton communities using imaging techniques has advanced significantly. The effectiveness of these automated systems appears to have improved, reaching acceptable levels of accuracy. However, plankton ecologists often find that classification systems do not work as well as expected when applied to new samples. This paper proposes a methodology to assess the efficacy of learned models which takes into account the fact that the data distribution (the plankton composition of the sample) can vary between the model building phase and the production phase. As opposed to most validation methods that consider the individual organism as the unit of validation, our approach uses a validation‐by‐sample, which is more appropriate when the objective is to estimate the abundance of different morphological groups. We argue that, in these cases, the base unit to correctly estimate the error is the sample, not the individual. Thus, model assessment processes require groups of samples with sufficient variability in order to provide precise error estimates.
The study of marine plankton data is vital to monitor the health of the world’s oceans. In recent decades, automatic plankton recognition systems have proved useful to address the vast amount of data collected by specially engineered in situ digital imaging systems. At the beginning, these systems were developed and put into operation using traditional automatic classification techniques, which were fed with hand-designed local image descriptors (such as Fourier features), obtaining quite successful results. In the past few years, there have been many advances in the computer vision community with the rebirth of neural networks. In this paper, we leverage how descriptors computed using convolutional neural networks trained with out-of-domain data are useful to replace hand-designed descriptors in the task of estimating the prevalence of each plankton class in a water sample. To achieve this goal, we have designed a broad set of experiments that show how effective these deep features are when working in combination with state-of-the-art quantification algorithms.
This paper presents a new approach for solving binary quantification problems based on nearest neighbor (NN) algorithms. Our main objective is to study the behavior of these methods in the context of prevalence estimation. We seek for NN-based quantifiers able to provide competitive performance while balancing simplicity and effectiveness. We propose two simple weighting strategies, PWK and PWK α , which stand out among state-of-the-art quantifiers. These proposed methods are the only ones that offer statistical differences with respect to less robust algorithms, like CC or AC. The second contribution of the paper is to introduce a new experiment methodology for quantification.
The effect of fluctuating ultraviolet radiation levels on the copepod Boeckella gracilipes was investigated in Lake Escondido (Patagonia, Argentina). The animals were incubated either at fixed depths or rotating in in situ plankton wheels of different diameters. The observed mortality was significantly higher in rotating treatments. Static incubations can be used to predict the mortality of vertically moving B. gracilipes, providing that the doses of UVA and UVB are known. The results suggest that under moderate wind conditions, the plankton of shallow lakes are exposed to potentially damaging levels of solar radiation, even in relatively turbid waters.
There are real applications that do not demand to classify or to make predictions about individual objects, but to estimate some magnitude about a group of them. For instance, one of these cases happen in sentiment analysis and opinion mining. Some applications require to classify opinions as positives or negatives, but there are also others, even more useful sometimes, that just need an estimation of which is the proportion of each class during a concrete period of time. "How many tweets about our new product were positive yesterday?" Practitioners should apply quantification algorithms to tackle this kind of problems, instead of just using off-the-shelf classification methods because classifiers are suboptimal in the context of quantification tasks. Unfortunately, quantification learning is still relatively an under explored area in machine learning. The goal of this paper is to show that quantification learning is an interesting open problem. In order to support its benefits, we shall show an application to analyze Twitter comments in which even the most simple quantification methods outperform classification approaches.
Isotope Dilution Mass Spectrometry (IDMS) has become an essential tool in research laboratories and is increasingly used in routine analysis labs (including environmental, food safety and clinical applications). This is the first textbook to present a comprehensive and instructive view of the theory and applications of this growing technique. The main objective of this book is to cover the theory and applications of Isotope Dilution in Analytical Chemistry. The scope is comprehensive to include elemental analysis, speciation analysis, organic analysis and biochemical and clinical analysis together with applications in metabolism studies and traceability of goods. Until now there have been no books published with the same general scope (only book chapters on particular applications). This is a textbook focused at post-graduate level covering the basic knowledge required for doctoral studies in this field. Isotope Dilution Mass Spectrometry will also outline practical applications of interest for routine testing laboratories where isotope dilution procedures are implemented or can be implemented in the future. This unique book covers all the theoretical and practical aspects of Isotope Dilution Mass Spectrometry (IDMS). Due to the increasing application of IDMS in many research laboratories and the increasing implementation of IDMS methodologies in routine testing laboratories, scientists in industry and working in or affiliated to this area will this an invaluable source of information. Concerning the theoretical aspects, the authors present a uniform theoretical background which grows from previous developments in Organic, Speciation and Elemental analysis both in their own laboratory and in other laboratories around the world. This general approach will be simpler and will also include new emerging fields such as quantitative proteomics and metabolism studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.