Proceedings of the 22nd International Conference on Computational Linguistics - COLING '08 2008
DOI: 10.3115/1599081.1599224
|View full text |Cite
|
Sign up to set email alerts
|

Active learning with sampling by uncertainty and density for word sense disambiguation and text classification

Abstract: This paper addresses two issues of active learning. Firstly, to solve a problem of uncertainty sampling that it often fails by selecting outliers, this paper presents a new selective sampling technique, sampling by uncertainty and density (SUD), in which a k-Nearest-Neighbor-based density measure is adopted to determine whether an unlabeled example is an outlier. Secondly, a technique of sampling by clustering (SBC) is applied to build a representative initial training data set for active learning. Finally, we… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
93
1
1

Year Published

2013
2013
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 119 publications
(97 citation statements)
references
References 12 publications
2
93
1
1
Order By: Relevance
“…When the core of the model is consolidated, items with highest uncertainty should provide a higher improvement in performance by effectively delimiting with more precision the decision frontier of the model. This phenomenon, which lies at the heart of well-known semi-supervised learning techniques like self-training (or bootstrapping), has also been noted by approaches combining density estimation methods when very few examples are available, and uncertainty sampling when the training dataset has grown [5,17].…”
Section: Relevant Workmentioning
confidence: 99%
“…When the core of the model is consolidated, items with highest uncertainty should provide a higher improvement in performance by effectively delimiting with more precision the decision frontier of the model. This phenomenon, which lies at the heart of well-known semi-supervised learning techniques like self-training (or bootstrapping), has also been noted by approaches combining density estimation methods when very few examples are available, and uncertainty sampling when the training dataset has grown [5,17].…”
Section: Relevant Workmentioning
confidence: 99%
“…The cold start problem has long been known to be a key difficulty in building effective classifiers quickly and cheaply via AL [13,16]. Since the quality of data selection directly depends on the understanding of the space provided by the "current" model, early stages of acquisitions can result in a vicious cycle of uninformative selections, leading to poor quality models and therefore to additional poor selections.…”
Section: Starting Coldmentioning
confidence: 99%
“…Zhu et al [13] developed a technique similar to the information density technique of Settles and Craven, selecting the instances according a uncertainty-based criterion modified by a density factor: U n (x) = U(x) KNN(x), where KNN(x) is the average cosine similarity of the K nearest neighbors to x. The same authors also propose the sampling by clustering, a density-only AL heuristic where the problem space is clustered, and the points closest to the cluster centeroids are selected for labeling.…”
Section: Density-sensitive Active Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, [19] weights the uncertainty of an instance by its density to avoid outliers, where density of the instance is defined as average similarity to other instances. [20] used a K-Nearest-Neighbor-based density measure to determine whether an unlabeled instance is an outlier. [9] proposed a hybrid approach to combine representative sampling and uncertainty sampling.…”
Section: Related Workmentioning
confidence: 99%