Abstract:In Machine Learning, it is common to distinguish different degrees of supervision, ranging from fully supervised to completely unsupervised scenarios. However, lying in between those, the Learning from Label Proportions (LLP) setting [19] assumes the training data is provided in the form of bags, and the only supervision comes through the proportion of each class in each bag. In this paper, we present a novel version of the LLP paradigm where the relationship among the classes is ordinal. While this is a highl… Show more
“…This approach enjoys uniform convergence under the following (strong) assumption: conditioned on its label, a point is independent of its bag assignment i, namely, p(x|y, i) = p(x|y). Extensions of MeanMap can be found in [12,13].…”
Learning with Label Proportions (LLP) is the problem of recovering the underlying true labels given a dataset when the data is presented in the form of bags. This paradigm is particularly suitable in contexts where providing individual labels is expensive and label aggregates are more easily obtained. In the healthcare domain, it is a burden for a patient to keep a detailed diary of their daily routines, but often they will be amenable to provide higher level summaries of daily behavior. We present a novel and efficient graph-based algorithm that encourages local smoothness and exploits the global structure of the data, while preserving the 'mass' of each bag.
“…This approach enjoys uniform convergence under the following (strong) assumption: conditioned on its label, a point is independent of its bag assignment i, namely, p(x|y, i) = p(x|y). Extensions of MeanMap can be found in [12,13].…”
Learning with Label Proportions (LLP) is the problem of recovering the underlying true labels given a dataset when the data is presented in the form of bags. This paradigm is particularly suitable in contexts where providing individual labels is expensive and label aggregates are more easily obtained. In the healthcare domain, it is a burden for a patient to keep a detailed diary of their daily routines, but often they will be amenable to provide higher level summaries of daily behavior. We present a novel and efficient graph-based algorithm that encourages local smoothness and exploits the global structure of the data, while preserving the 'mass' of each bag.
“…In Poyiadzi et al (2022), the authors introduced a framework that follows from the asymptotic properties of Maximum Likelihood Estimation (MLE) of the logistic regression model and thus its correctness relies on whether the assumptions of the model are met. In practice, the linear assumptions behind the method mean that, unless there is an element of control over the data generating process, the blind application of the test can lead to suboptimal outcomes.…”
Section: Introductionmentioning
confidence: 99%
“…• We extend the parametric form of the hypothesis test for class-conditional label noise proposed in Poyiadzi et al (2022), and consider a nonparametric estimation of the underlying regression function based on local likelihood models. • We thoroughly compare the strengths and weaknesses of these two methods respectively, and provide guidelines for machine learning practitioners to know which one to use given a dataset.…”
Section: Introductionmentioning
confidence: 99%
“…Testing for Linearity In this work, we introduce an extension of the tests in Poyiadzi et al (2022) that replaces the use of a parametric model, with a nonparametric model. For a practitioner, it would be necessary to know whether a parametric (or linear) fit is suitable for their needs.…”
In supervised learning, automatically assessing the quality of the labels before any learning takes place remains an open research question. In certain particular cases, hypothesis testing procedures have been proposed to assess whether a given instance-label dataset is contaminated with class-conditional label noise, as opposed to uniform label noise. The existing theory builds on the asymptotic properties of the Maximum Likelihood Estimate for parametric logistic regression. However, the parametric assumptions on top of which these approaches are constructed are often too strong and unrealistic in practice. To alleviate this problem, in this paper we propose an alternative path by showing how similar procedures can be followed when the underlying model is a product of Local Maximum Likelihood Estimation that leads to more flexible nonparametric logistic regression models, which in turn are less susceptible to model misspecification. This different view allows for wider applicability of the tests by offering users access to a richer model class. Similarly to existing works, we assume we have access to anchor points which are provided by the users. We introduce the necessary ingredients for the adaptation of the hypothesis tests to the case of nonparametric logistic regression and empirically compare against the parametric approach presenting both synthetic and real-world case studies and discussing the advantages and limitations of the proposed approach.
“…Although this relaxes the assumption of infallibility of the oracle, we argue that exploring varying degrees of supervision can lead to an easier and simpler interaction in between the algorithm and the oracle. In this paper, we cast our problem as an instance of the Learning from Label Proportions (LLP) setting [8,9,10,11,12]. Figure 1 illustrates this idea.…”
Active Learning (AL) refers to the setting where the learner has the ability to perform queries to an oracle to acquire the true label of an instance or, sometimes, a set of instances. Even though Active Learning has been studied extensively, the setting is usually restricted to assume that the oracle is trustworthy and will provide the actual label. We argue that, while common, this approach can be made more flexible to account for different forms of supervision. In this paper, we propose a new framework that allows the algorithm to request the label for a bag of samples at a time. Although this label will come in the form of proportions of class labels in the bags and therefore encode less information, we demonstrate that we can still learn effectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.