Active learning with sampling by uncertainty and density for word sense disambiguation and text classification

Zhu, Jun; Wang, Huizhen; Yao, Tianyu; Tsou, Benjamin K.

doi:10.3115/1599081.1599224

Cited by 119 publications

(97 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…When the core of the model is consolidated, items with highest uncertainty should provide a higher improvement in performance by effectively delimiting with more precision the decision frontier of the model. This phenomenon, which lies at the heart of well-known semi-supervised learning techniques like self-training (or bootstrapping), has also been noted by approaches combining density estimation methods when very few examples are available, and uncertainty sampling when the training dataset has grown [5,17].…”

Section: Relevant Workmentioning

confidence: 99%

Information Extraction with Active Learning: A Case Study in Legal Text

Cardellino

Villata

Alemany

et al. 2015

Computational Linguistics and Intelligent Text Processing

View full text Add to dashboard Cite

Abstract. Active learning has been successfully applied to a number of NLP tasks. In this paper, we present a study on Information Extraction for natural language licenses that need to be translated to RDF. The final purpose of our work is to automatically extract from a natural language document specifying a certain license a machine-readable description of the terms of use and reuse identified in such license. This task presents some peculiarities that make it specially interesting to study: highly repetitive text, few annotated or unannotated examples available, and very fine precision needed. In this paper we compare different active learning settings for this particular application. We show that the most straightforward approach to instance selection, uncertainty sampling, does not provide a good performance in this setting, performing even worse than passive learning. Density-based methods are the usual alternative to uncertainty sampling, in contexts with very few labelled instances. We show that we can obtain a similar effect to that of density-based methods using uncertainty sampling, by just reversing the ranking criterion, and choosing the most certain instead of the most uncertain instances.

show abstract

Section: Relevant Workmentioning

confidence: 99%

Information Extraction with Active Learning: A Case Study in Legal Text

Cardellino

Villata

Alemany

et al. 2015

Computational Linguistics and Intelligent Text Processing

View full text Add to dashboard Cite

show abstract

“…The cold start problem has long been known to be a key difficulty in building effective classifiers quickly and cheaply via AL [13,16]. Since the quality of data selection directly depends on the understanding of the space provided by the "current" model, early stages of acquisitions can result in a vicious cycle of uninformative selections, leading to poor quality models and therefore to additional poor selections.…”

Section: Starting Coldmentioning

confidence: 99%

“…Zhu et al [13] developed a technique similar to the information density technique of Settles and Craven, selecting the instances according a uncertainty-based criterion modified by a density factor: U n (x) = U(x) KNN(x), where KNN(x) is the average cosine similarity of the K nearest neighbors to x. The same authors also propose the sampling by clustering, a density-only AL heuristic where the problem space is clustered, and the points closest to the cluster centeroids are selected for labeling.…”

Section: Density-sensitive Active Learningmentioning

confidence: 99%

“…We can estimate the expected utility of such a query by: EU (q) = K k=1 P (q = c k )U(q = c k )/ω q , where P (q = c k ) is the probability of the instance or feature queried being associated with class c k , ω q is the cost of query q, and U is some measure of the utility of q. 13 This results in the decision-theoretic optimal policy, which is to ask for feature labels which, once incorporated into the data, will result in the highest increase in classification performance in expectation [51,63].…”

Section: Feature-based Learning and Active Dual Supervisionmentioning

confidence: 99%

See 1 more Smart Citation

Class Imbalance and Active Learning

2013

Imbalanced Learning

View full text Add to dashboard Cite

Abstract:The performance of a predictive model is tightly coupled with the data used during training. While using more examples in the training will often result in a better informed, more accurate model; limits on computer memory and real-world costs associated with gathering labeled examples often constrain the amount of data that can be used for training. In settings where the number of training examples is limited, it often becomes meaningful to carefully see just which examples are Q1 selected. In active learning (AL), the model itself places a hands-on role in the selection of examples for labeling from a large pool of unlabeled examples. These examples are used for model training. Numerous studies have demonstrated, both empirically and theoretically, the benefits of AL: Given a fixed budget, a training system that interactively involves the current model in selecting the training examples can often result in a far greater accuracy than a system that simply selects random training examples. Imbalanced settings provide special opportunities and challenges for AL. For example, while AL can be used to build models that counteract the harmful effects of learning under class imbalance, extreme class imbalance can cause an AL strategy to "fail," preventing the selection scheme from choosing any useful examples for labeling. This chapter focuses on the interaction between AL and class imbalance, discussing (i) AL techniques designed specifically for dealing with imbalanced settings, (ii) strategies that leverage AL to overcome the deleterious effects of class imbalance, (iii) how extreme class imbalance can prevent AL systems from selecting useful examples, and alternatives to AL in these cases.

show abstract

“…For example, [19] weights the uncertainty of an instance by its density to avoid outliers, where density of the instance is defined as average similarity to other instances. [20] used a K-Nearest-Neighbor-based density measure to determine whether an unlabeled instance is an outlier. [9] proposed a hybrid approach to combine representative sampling and uncertainty sampling.…”

Section: Related Workmentioning

confidence: 99%

Most-Surely vs. Least-Surely Uncertain

Sharma

Bilgic

2013

2013 IEEE 13th International Conference on Data Mining

View full text Add to dashboard Cite

Abstract-Active learning methods aim to choose the most informative instances to effectively learn a good classifier. Uncertainty sampling, arguably the most frequently utilized active learning strategy, selects instances which are uncertain according to the model. In this paper, we propose a framework that distinguishes between two types of uncertainties: a model is uncertain about an instance due to strong and conflicting evidence (most-surely uncertain) vs. a model is uncertain about an instance because it does not have conclusive evidence (leastsurely uncertain). We show that making a distinction between these uncertainties makes a huge difference to the performance of active learning. We provide a mathematical formulation to distinguish between these uncertainties for naive Bayes, logistic regression and support vector machines and empirically evaluate our methods on several real-world datasets.

show abstract

Active learning with sampling by uncertainty and density for word sense disambiguation and text classification

Cited by 119 publications

References 12 publications

Information Extraction with Active Learning: A Case Study in Legal Text

Information Extraction with Active Learning: A Case Study in Legal Text

Class Imbalance and Active Learning

Most-Surely vs. Least-Surely Uncertain

Contact Info

Product

Resources

About