2003
DOI: 10.1007/3-540-36618-0_28
|View full text |Cite
|
Sign up to set email alerts
|

Representative Sampling for Text Classification Using Support Vector Machines

Abstract: Abstract. In order to reduce human efforts, there has been increasing interest in applying active learning for training text classifiers. This paper describes a straightforward active learning heuristic, representative sampling, which explores the clustering structure of 'uncertain' documents and identifies the representative samples to query the user opinions, for the purpose of speeding up the convergence of Support Vector Machine (SVM) classifiers. Compared with other active learning algorithms, the propose… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
134
0
2

Year Published

2006
2006
2013
2013

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 158 publications
(137 citation statements)
references
References 9 publications
1
134
0
2
Order By: Relevance
“…However, after rapid initial gains, DWUS exhibits very slow additional learning while uncertainty sampling continues to exhibit more rapid improvement. 1 A similar behavior is also evident in [8] where their representative sampling method increases accuracy in the initial phase while uncertainty sampling has a slower learning rate, but gradually outperforms their method. We investigated the Spearman's ranking correlation over candidates to be labeled by density and uncertainty in our scenario, and found that they seldom reinforce each other, but instead they tend to disagree on sample point selection.…”
Section: Motivation For Dualmentioning
confidence: 67%
See 4 more Smart Citations
“…However, after rapid initial gains, DWUS exhibits very slow additional learning while uncertainty sampling continues to exhibit more rapid improvement. 1 A similar behavior is also evident in [8] where their representative sampling method increases accuracy in the initial phase while uncertainty sampling has a slower learning rate, but gradually outperforms their method. We investigated the Spearman's ranking correlation over candidates to be labeled by density and uncertainty in our scenario, and found that they seldom reinforce each other, but instead they tend to disagree on sample point selection.…”
Section: Motivation For Dualmentioning
confidence: 67%
“…These parameters and the δ parameter used for switching criteria were estimated on other data sets and held constant throughout our experiments, in order to avoid over-tuning. We compared the performance of DUAL with that of DWUS, uncertainty sampling, representative sampling 2 [8], density-based sampling and the COMB method of [10]. Density-based sampling adopts the same probabilistic framework as DWUS but uses only the density information for active data selection: x * s = arg max i∈Iu p(x i ).…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations