Estimating labels from label proportions

Quadrianto, Novi; Smola, Alex; Caetano, Tibério; Le, Quoc V.

doi:10.1145/1390156.1390254

Cited by 108 publications

(165 citation statements)

References 14 publications

Supporting

Mentioning

163

Contrasting

Order By: Relevance

“…Additionally, we will consider combining labeled and unlabeled data using semi-supervised learning from label proportions (Quadrianto et al, 2009;Ganchev et al, 2010;Mann and McCallum, 2010).…”

Section: Discussionmentioning

confidence: 99%

Using County Demographics to Infer Attributes of Twitter Users

Mohammady¹,

Culotta²

2014

Proceedings of the Joint Workshop on Social Dynamics and Personal Attributes in Social Media

View full text Add to dashboard Cite

Social media are increasingly being used to complement traditional survey methods in health, politics, and marketing. However, little has been done to adjust for the sampling bias inherent in this approach. Inferring demographic attributes of social media users is thus a critical step to improving the validity of such studies. While there have been a number of supervised machine learning approaches to this problem, these rely on a training set of users annotated with attributes, which can be difficult to obtain. We instead propose training a demographic attribute classifiers that uses county-level supervision. By pairing geolocated social media with county demographics, we build a regression model mapping text to demographics. We then adopt this model to make predictions at the user level. Our experiments using Twitter data show that this approach is surprisingly competitive with a fully supervised approach, estimating the race of a user with 80% accuracy.

show abstract

Section: Discussionmentioning

confidence: 99%

Using County Demographics to Infer Attributes of Twitter Users

Mohammady¹,

Culotta²

2014

Proceedings of the Joint Workshop on Social Dynamics and Personal Attributes in Social Media

View full text Add to dashboard Cite

show abstract

“…We have compared the LLP algorithm to three state-of-the-art methods for learning from label proportions: The Mean Map method [19], Inverse Calibration (Invcal) [21] and AOC Kernel k-Means (AOC-KK) [6]. For a further discussion of these methods, see Sect.…”

Section: Methodsmentioning

confidence: 99%

“…Quadrianto et al [19] have proposed the Mean Map method which estimates the conditional class probability P (Y |X, θ) by conditional exponential models, using a feature map Φ(X, Y ) and a normalization function g:…”

Section: Related Workmentioning

confidence: 99%

Learning from Label Proportions by Optimizing Cluster Model Selection

Stolpe

Morik

2011

Machine Learning and Knowledge Discovery in Databases

View full text Add to dashboard Cite

Abstract. In a supervised learning scenario, we learn a mapping from input to output values, based on labeled examples. Can we learn such a mapping also from groups of unlabeled observations, only knowing, for each group, the proportion of observations with a particular label? Solutions have real world applications. Here, we consider groups of steel sticks as samples in quality control. Since the steel sticks cannot be marked individually, for each group of sticks it is only known how many sticks of high (low) quality it contains. We want to predict the achieved quality for each stick before it reaches the final production station and quality control, in order to save resources. We define the problem of learning from label proportions and present a solution based on clustering. Our method empirically shows a better prediction performance than recent approaches based on probabilistic SVMs, Kernel k-Means or conditional exponential models.

show abstract

“…But naive grid labeling provide too little information (only the label of the major class in a cell) on sample classes, which will inevitably introduce label uncertainty and eventually reduce the classification accuracy. According to the idea of learning with label proportions [9][10][11], we can learn a model to predict labels of the individual samples by grouping the training samples and providing proportions of the labels in each group. However, the current definition of label proportion [11] ignores spatial sources of samples in each group, which means samples in a group are not necessarily from the same local region, making it not convenient for sample labeling of remote sensing images.…”

Section: Proportional Grid Labelingmentioning

confidence: 99%

“…The issue of learning from label proportions is raising attentions in the machine learning area [9][10][11][12]. For learning from label proportions, the training samples are divided into groups and label proportions of samples in each group are given as sample truth, instead of giving the label of each sample in the training set [11].…”

Section: Introductionmentioning

confidence: 99%

Learning from label proportions for SAR image classification

Ding

Wang

2017

EURASIP J. Adv. Signal Process.

View full text Add to dashboard Cite

Synthetic aperture radar (SAR) image classification plays a key role in SAR interpretation. Due to the cost and difficulty of truth labeling for SAR images, the newly labeled samples available for image classification are very limited. This paper focuses on defining a new sample labeling method to solve the problem of truth acquisition for training data in SAR image classification. An efficient classification framework for high-resolution SAR images is presented in this paper, which is built on learning from uncertain labels. We use grid labeling for rapid training data acquisition by assigning a label to a group of neighboring pixels at a time. A novel SVM-based learning model is proposed to optimize the uncertain training data within the constraints of label proportions in each group and then predict the label of each sample for the test data based on the optimized training set. This work intends to explore a rapid labeling method called grid labeling for efficient training set definition and apply it to large-scale SAR image classification. The model demonstrates good performance in both accuracy and efficiency for scene interpretation of high-resolution SAR images.

show abstract

Estimating labels from label proportions

Cited by 108 publications

References 14 publications

Using County Demographics to Infer Attributes of Twitter Users

Using County Demographics to Infer Attributes of Twitter Users

Learning from Label Proportions by Optimizing Cluster Model Selection

Learning from label proportions for SAR image classification

Contact Info

Product

Resources

About