Untitled

Kubát, Miroslav; Holte, Robert C.; Matwin, Stan

doi:10.1023/a:1007452223027

Cited by 1,048 publications

(139 citation statements)

References 51 publications

Supporting

Mentioning

138

Contrasting

Order By: Relevance

“…In fact, the two types of misclassification were given the same cost in the learning process, but since the classes are imbalanced (class 3 being three times more frequent than class 1), the classifier tends to better classify the more frequent class. This is a well known problem for machine learning classifiers, encountered in various domains [16]- [18], and can be addressed correctly. If one considers, for instance, that classifying a sample of class 1 (high priority) as class 3 (low priority) is costlier than the contrary, then this could be taken into account in the training procedure by over-representing class 1 (or under-representing class 3) in the learning database.…”

Section: Resultsmentioning

confidence: 99%

A Decision Tree Classifier for Intrusion Detection Priority Tagging

Ammar¹

2015

JCC

View full text Add to dashboard Cite

Snort rule-checking is one of the most popular forms of Network Intrusion Detection Systems (NIDS). In this article, we show that Snort priorities of true positive traffic (real attacks) can be approximated in real-time, in the context of high speed networks, by a decision tree classifier, using the information of only three easily extracted features (protocol, source port, and destination port), with an accuracy of 99%. Snort issues alert priorities based on its own default set of attack classes (34 classes) that are used by the default set of rules it provides. But the decision tree model is able to predict the priorities without using this default classification. The obtained tagger can provide a useful complement to an anomaly detection intrusion detection system.

show abstract

Section: Resultsmentioning

confidence: 99%

A Decision Tree Classifier for Intrusion Detection Priority Tagging

Ammar¹

2015

JCC

View full text Add to dashboard Cite

show abstract

“…The machine learning community has approached this probably through both resampling the original data set (either by oversampling the minority class or undersampling the majority class; Lewis & Catlett 1994;Kubat & Matwin 1997;Ling & Li 1998;Japkowicz 2000) or by adding costs to the training examples (Pazzani et al 1994;Domingos 1999). SMOTE provides an approach that combines both oversampling the minority (or interesting) class and undersampling the majority class.…”

Section: Further Analysismentioning

confidence: 99%

Machine Learning Techniques for Stellar Light Curve Classification

Hinners

Tat

Thorp

2018

View full text Add to dashboard Cite

We apply machine learning techniques in an attempt to predict and classify stellar properties from noisy and sparse time-series data. We preprocessed over 94 GB of Kepler light curves from the Mikulski Archive for Space Telescopes (MAST) to classify according to 10 distinct physical properties using both representation learning and feature engineering approaches. Studies using machine learning in the field have been primarily done on simulated data, making our study one of the first to use real light-curve data for machine learning approaches. We tuned our data using previous work with simulated data as a template and achieved mixed results between the two approaches. Representation learning using a long short-term memory recurrent neural network produced no successful predictions, but our work with feature engineering was successful for both classification and regression. In particular, we were able to achieve values for stellar density, stellar radius, and effective temperature with low error (∼2%-4%) and good accuracy (∼75%) for classifying the number of transits for a given star. The results show promise for improvement for both approaches upon using larger data sets with a larger minority class. This work has the potential to provide a foundation for future tools and techniques to aid in the analysis of astrophysical data.

show abstract

“…Another method based on the Nearest Neighbour Rule is the One-side Sampling (OSS) [60] method. It is based on the idea of discarding instances distant from a decision border, since these instances can be considered as useless for learning.…”

Section: Under-samplingmentioning

confidence: 99%

“…Two disadvantages of this method were described in literature. The first one, the instance replication increases likelihood of the over-fitting [19] and the second, enlarging the training set by the over-sampling can lead to a longer learning phase and a model response [60], mainly for lazy learners.…”

Section: Over-samplingmentioning

confidence: 99%