1998
DOI: 10.1023/a:1007452223027
|View full text |Cite
|
Sign up to set email alerts
|

Untitled

Abstract: Abstract. During a project examining the use of machine learning techniques for oil spill detection, we encountered several essential questions that we believe deserve the attention of the research community. We use our particular case study to illustrate such issues as problem formulation, selection of evaluation measures, and data preparation. We relate these issues to properties of the oil spill application, such as its imbalanced class distribution, that are shown to be common to many applications. Our sol… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
138
0

Year Published

2004
2004
2018
2018

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 1,048 publications
(139 citation statements)
references
References 51 publications
1
138
0
Order By: Relevance
“…In fact, the two types of misclassification were given the same cost in the learning process, but since the classes are imbalanced (class 3 being three times more frequent than class 1), the classifier tends to better classify the more frequent class. This is a well known problem for machine learning classifiers, encountered in various domains [16]- [18], and can be addressed correctly. If one considers, for instance, that classifying a sample of class 1 (high priority) as class 3 (low priority) is costlier than the contrary, then this could be taken into account in the training procedure by over-representing class 1 (or under-representing class 3) in the learning database.…”
Section: Resultsmentioning
confidence: 99%
“…In fact, the two types of misclassification were given the same cost in the learning process, but since the classes are imbalanced (class 3 being three times more frequent than class 1), the classifier tends to better classify the more frequent class. This is a well known problem for machine learning classifiers, encountered in various domains [16]- [18], and can be addressed correctly. If one considers, for instance, that classifying a sample of class 1 (high priority) as class 3 (low priority) is costlier than the contrary, then this could be taken into account in the training procedure by over-representing class 1 (or under-representing class 3) in the learning database.…”
Section: Resultsmentioning
confidence: 99%
“…The machine learning community has approached this probably through both resampling the original data set (either by oversampling the minority class or undersampling the majority class; Lewis & Catlett 1994;Kubat & Matwin 1997;Ling & Li 1998;Japkowicz 2000) or by adding costs to the training examples (Pazzani et al 1994;Domingos 1999). SMOTE provides an approach that combines both oversampling the minority (or interesting) class and undersampling the majority class.…”
Section: Further Analysismentioning
confidence: 99%
“…Another method based on the Nearest Neighbour Rule is the One-side Sampling (OSS) [60] method. It is based on the idea of discarding instances distant from a decision border, since these instances can be considered as useless for learning.…”
Section: Under-samplingmentioning
confidence: 99%
“…Two disadvantages of this method were described in literature. The first one, the instance replication increases likelihood of the over-fitting [19] and the second, enlarging the training set by the over-sampling can lead to a longer learning phase and a model response [60], mainly for lazy learners.…”
Section: Over-samplingmentioning
confidence: 99%