1997
DOI: 10.1007/3-540-62858-4_79
|View full text |Cite
|
Sign up to set email alerts
|

Learning when negative examples abound

Abstract: Abstract. Existing concept learning systems can fail when the negative examples heavily outnumber the positive examples. The paper discusses one essential trouble brought about by imbalanced training sets and presents a learning algorithm addressing this issue. The experiments (with synthetic and real-world data) focus on 2-class problems with examples described with binary and continuous attributes.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
168
0
1

Year Published

1998
1998
2023
2023

Publication Types

Select...
8
1
1

Relationship

1
9

Authors

Journals

citations
Cited by 251 publications
(178 citation statements)
references
References 6 publications
1
168
0
1
Order By: Relevance
“…We would have had to model both the within-batch characteristics and the across-batch characteristics, and we simply did not have enough data or batches to do this with any certainty. To try to ensure that our learning algorithm is not specific to our particular dataset, we have tested it on other datasets having similar characteristics (Kubat, Holte & Matwin, 1997).…”
Section: Methodological Issuesmentioning
confidence: 99%
“…We would have had to model both the within-batch characteristics and the across-batch characteristics, and we simply did not have enough data or batches to do this with any certainty. To try to ensure that our learning algorithm is not specific to our particular dataset, we have tested it on other datasets having similar characteristics (Kubat, Holte & Matwin, 1997).…”
Section: Methodological Issuesmentioning
confidence: 99%
“…Since we have unbalanced class distribution (see Table 1), Classification accuracy can give misleading results. For such domains more appropriate measure is Information score [7] or Geometric mean of accuracy [8]. In the experimental results presented in Figure 1 Classification accuracy and Information score are used to estimate model quality.…”
Section: Methodsmentioning
confidence: 99%
“…Commonly used metrics for two classes problems 24,45 include the arithmetic and geometric means of the sensitivity acc + = TP rate and the specificity acc = TN rate . In particular, the geometric mean of both values is an interesting indicator of the quality of a classifier for imbalanced data, because it is high when both acc + and acc are high or when the difference between acc+ and acc is small 29 . Optimizing the geometric mean is a compromise intended for maximizing the accuracy on both classes while keeping these accuracies balanced 30 .…”
Section: Notation and Metrics For Two-classes Problemsmentioning
confidence: 99%