Proceedings of the 10th Annual Conference Companion on Genetic and Evolutionary Computation 2008
DOI: 10.1145/1388969.1389020
|View full text |Cite
|
Sign up to set email alerts
|

Informative sampling for large unbalanced data sets

Abstract: Selective sampling is a form of active learning which can reduce the cost of training by only drawing informative data points into the training set. This selected training set is expected to contain more information for modeling compared to random sampling, thus making modeling faster and more accurate. We introduce a novel approach to selective sampling, which is derived from the Estimation-Exploration Algorithm (EEA). The EEA is a coevolutionary algorithm that uses model disagreement to determine the signifi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
6
0
1

Year Published

2009
2009
2016
2016

Publication Types

Select...
5

Relationship

3
2

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 29 publications
0
6
0
1
Order By: Relevance
“…Informative sampling [6] (henceforth referred to as IS) applies the EEA as an active learning method for classification. The algorithm works by iteratively evolving an ensemble of classifiers based on the current training set and scanning a portion of the whole data set to select the data point that causes maximal disagreement.…”
Section: Introduction and Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Informative sampling [6] (henceforth referred to as IS) applies the EEA as an active learning method for classification. The algorithm works by iteratively evolving an ensemble of classifiers based on the current training set and scanning a portion of the whole data set to select the data point that causes maximal disagreement.…”
Section: Introduction and Methodsmentioning
confidence: 99%
“…The algorithm works by iteratively evolving an ensemble of classifiers based on the current training set and scanning a portion of the whole data set to select the data point that causes maximal disagreement. Although artificial neural networks [4] were chosen as the classifier type in [6], informative sampling is a general algorithm that could work with a wide range of classifiers. Many types of classifiers exist in the literature, such as decision trees, ANNs, and support vector machines (SVM).…”
Section: Introduction and Methodsmentioning
confidence: 99%
“…Informative sampling [13] applied the EEA as an active learning method for classification. The algorithm works by iteratively optimizing an ensemble of classifiers based on the current training set and scanning a portion of the whole data set to select the data point that causes maximal disagreement among the label predictions of the current classifiers.…”
Section: Introductionmentioning
confidence: 99%
“…It has been shown to outperform random sampling [21] and balanced sampling [11] for a large unbalanced data set called the National Trauma Data Bank (NTDB) 1 . Although artificial neural networks (ANN) [10] were chosen as the classifier type in [13], informative sampling is a general algorithm that could work with any classifier. Many types of classi-fiers exist in the literature, such as decision trees (DT) [16], ANNs [10], and SVMs.…”
Section: Introductionmentioning
confidence: 99%
“…In a different approach, the estimation-exploration algorithm [19] [20][21](EEA) uses stochastic optimization algorithms to optimize models and find disagreement-causing data points. Informative sampling [22] adapted the EEA for data mining, but its performance is constrained by the limited power of stochastic optimization. For some classifiers, a satisfactory stochastic optimization algorithm may not exist, and existing classifiers such as C4.5 and Naive Bayes [23] can not currently be used in this framework.…”
mentioning
confidence: 99%