Active Learning from Positive and Unlabeled Data

Ghasemi, Alireza; Rabiee, Hamid R.; Fadaee, Mohsen; Manzuri, Mohammad Taghi; Rohban, Mohammad Hossein

doi:10.1109/icdmw.2011.20

Cited by 22 publications

(14 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Positive and unlabeled (PU) data can be regarded as a kind of noisy-label data, in which we mainly consider the probability that positive samples are mislabeled as negative ones. Ghasemi et al [12] proposed an active learning algorithm for PU data, which works by separately estimating probability density of positive and unlabeled points and then computing expected value of informativeness to get rid of a hyperparameter and have a better measure of informativeness. Plessis et al [33] proposed a cost-sensitive classifier, which utilizes a non-convex loss to prevent the superfluous penalty term in the objective function.…”

Section: Noisy-label Robust Learningmentioning

confidence: 99%

Sampler Design for Implicit Feedback Data by Noisy-label Robust Learning

Qin

2020

Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

Implicit feedback data is extensively explored in recommendation as it is easy to collect and generally applicable. However, predicting users' preference on implicit feedback data is a challenging task since we can only observe positive (voted) samples and unvoted samples. It is difficult to distinguish between the negative samples and unlabeled positive samples from the unvoted ones. Existing works, such as Bayesian Personalized Ranking (BPR), sample unvoted items as negative samples uniformly, therefore suffer from a critical noisy-label issue. To address this gap, we design an adaptive sampler based on noisy-label robust learning for implicit feedback data. To formulate the issue, we first introduce Bayesian Point-wise Optimization (BPO) to learn a model, e.g., Matrix Factorization (MF), by maximum likelihood estimation. We predict users' preferences with the model and learn it by maximizing likelihood of observed data labels, i.e., a user prefers her positive samples and has no interests in her unvoted samples. However, in reality, a user may have interests in some of her unvoted samples, which are indeed positive samples mislabeled as negative ones. We then consider the risk of these noisy labels, and propose a Noisy-label Robust BPO (NBPO). NBPO also maximizes the observation likelihood while connects users' preference and observed labels by the likelihood of label flipping based on the Bayes' theorem. In NBPO, a user prefers her true positive samples and shows no interests in her true negative samples, hence the optimization quality is dramatically improved. Extensive experiments on two public real-world datasets show the significant improvement of our proposed optimization methods.

show abstract

Section: Noisy-label Robust Learningmentioning

confidence: 99%

Sampler Design for Implicit Feedback Data by Noisy-label Robust Learning

Qin

2020

Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

show abstract

“…These actions could exist of labeling part of the unseen data, such that it could be used for retraining a supervised machine learning algorithm (e.g. active learning [2,3]), to quickly generalize to the unseen data. Or by using methods such as data augmentation [4], transfer learning [1,5,6] or representation learning [7,8], which are commonly used to extend the scope of machine learning algorithms.…”

Section: Introductionmentioning

confidence: 99%

The data representativeness criterion: Predicting the performance of supervised classification based on data set similarity

et al. 2020

View full text Add to dashboard Cite

In a broad range of fields it may be desirable to reuse a supervised classification algorithm and apply it to a new data set. However, generalization of such an algorithm and thus achieving a similar classification performance is only possible when the training data used to build the algorithm is similar to new unseen data one wishes to apply it to. It is often unknown in advance how an algorithm will perform on new unseen data, being a crucial reason for not deploying an algorithm at all. Therefore, tools are needed to measure the similarity of data sets. In this paper, we propose the Data Representativeness Criterion (DRC) to determine how representative a training data set is of a new unseen data set. We present a proof of principle, to see whether the DRC can quantify the similarity of data sets and whether the DRC relates to the performance of a supervised classification algorithm. We compared a number of magnetic resonance imaging (MRI) data sets, ranging from subtle to severe difference is acquisition parameters. Results indicate that, based on the similarity of data sets, the DRC is able to give an indication as to when the performance of a supervised classifier decreases. The strictness of the DRC can be set by the user, depending on what one considers to be an acceptable underperformance.

show abstract

“…Besides the three proposed query strategies, our experiments considered the active learning approaches "lh" uncertainty sampling [5], expected margin sampling [7], entropy sampling [7], outlier sampling [6] and random sampling. The two kernel density estimation strategies of [7] were implemented using the publicly available code by Ghasemi [18]. In the case of large datasets, batches consisting of multiple samples were queried after each iterative training step in order to reduce the computational effort.…”

Section: A Setupmentioning

confidence: 99%

“…Ghasemi et al presented a different approach which makes use of the distribution of target and unlabeled samples and does not consider the classification results of an OCC for active learning. Based on kernel density estimation, they proposed the two query strategies expected margin sampling and entropy sampling [7]. There exist further query strategies which are not considered in this paper due to their limitation to special OCCs or high computational cost.…”

Section: Introductionmentioning

confidence: 99%

Active Learning for One-Class Classification Using Two One-Class Classifiers

Schlachter

Yang

2018

2018 26th European Signal Processing Conference (EUSIPCO)

View full text Add to dashboard Cite

This paper introduces a novel, generic active learning method for one-class classification. Active learning methods play an important role to reduce the efforts of manual labeling in the field of machine learning. Although many active learning approaches have been proposed during the last years, most of them are restricted on binary or multi-class problems. One-class classifiers use samples from only one class, the so-called target class, during training and hence require special active learning strategies. The few strategies proposed for one-class classification either suffer from their limitation on specific one-class classifiers or their performance depends on particular assumptions about datasets like imbalance. Our proposed method bases on using two one-class classifiers, one for the desired target class and one for the so-called outlier class. It allows to invent new query strategies, to use binary query strategies and to define simple stopping criteria. Based on the new method, two query strategies are proposed. The provided experiments compare the proposed approach with known strategies on various datasets and show improved results in almost all situations.

show abstract

Active Learning from Positive and Unlabeled Data

Cited by 22 publications

References 16 publications

Sampler Design for Implicit Feedback Data by Noisy-label Robust Learning

Sampler Design for Implicit Feedback Data by Noisy-label Robust Learning

The data representativeness criterion: Predicting the performance of supervised classification based on data set similarity

Active Learning for One-Class Classification Using Two One-Class Classifiers

Contact Info

Product

Resources

About