2010
DOI: 10.1109/tasl.2009.2033421
|View full text |Cite
|
Sign up to set email alerts
|

Active Learning With Sampling by Uncertainty and Density for Data Annotations

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
60
0
1

Year Published

2013
2013
2023
2023

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 108 publications
(61 citation statements)
references
References 21 publications
0
60
0
1
Order By: Relevance
“…And the key for increasing accuracy of active machine learning algorithm depends on the selection of high-informative samples [13]. Conventional active learning algorithms formed the initial training set by selecting high-representative samples through clustering analysis [14], and then labeled the most uncertain samples. However, usually achieving unsatisfying effects in interactive information retrieval is due to small initial training set and existence of outliers [15].…”
Section: State Of the Artmentioning
confidence: 99%
“…And the key for increasing accuracy of active machine learning algorithm depends on the selection of high-informative samples [13]. Conventional active learning algorithms formed the initial training set by selecting high-representative samples through clustering analysis [14], and then labeled the most uncertain samples. However, usually achieving unsatisfying effects in interactive information retrieval is due to small initial training set and existence of outliers [15].…”
Section: State Of the Artmentioning
confidence: 99%
“…In addition to this confidence-based uncertainty measure, other measures are common as well (Settles 2012), like entropy or the margin between a candidate and the decision boundary. Similar to the issue of the true posterior above, a known drawback (Zhu et al 2010) of US is that these proxies do not consider the number of similar instances on which the posterior estimates are made or the decision boundaries are drawn. The reported results of empirical evaluations are somewhat inconclusive, with some authors [e.g.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Xu et al (2003) proposed a representative sampling method, which first cluster the unlabeled examples located in the margin of an SVM classifier, and then queries the labels of the examples that are close to each cluster centroid. Zhu et al (2010) presented a method called K-Nearest-Neighbor-based density measure that quantifies density by the average similarity between an unlabeled example and its K nearest neighbors, and weighted the entropy-based uncertainty by the KNN density. McCallum et al (1998) proposed a density-weighted QBC algorithm, which chooses examples with the highest committee disagreement in predicted labels weighted by sample density.…”
Section: General Active Learningmentioning
confidence: 99%
“…However, the major shortcoming is that they cannot differentiate outliers from informative points, and thus often fail by selecting outliers (Settles 2012). To solve this so-called outlier problem, several density-weighted active learning approaches have been proposed by modeling the input distribution explicitly during data sampling (Xu et al 2003;Zhu et al 2010;McCallum and Nigam 1998;Nguyen and Smeulders 2004. The central idea of using prior data density in active learning is that it considers the whole input space rather than individual data points.…”
mentioning
confidence: 99%