Proceedings of the 2019 8th International Conference on Educational and Information Technology 2019
DOI: 10.1145/3318396.3318448
|View full text |Cite
|
Sign up to set email alerts
|

A Study on the Effect of Feature Selection on Malware Analysis using Machine Learning

Abstract: In this paper, the effect of feature selection in malware detection using machine learning techniques is studied. We employ supervised and unsupervised machine learning algorithms with and without feature selection. These include both classification and clustering algorithms. The algorithms are compared for effectiveness and efficiency using their predictive accuracy, among others, as performance metric. From the studies, we observe that the best detection rate was attained for supervised learning with feature… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
13
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 28 publications
(14 citation statements)
references
References 24 publications
0
13
0
1
Order By: Relevance
“…Some features might provide limited information on the actual contents of malicious applications to the classifier [41,42]. The imperative goals of any malware-detection system include the identification of a subset of features from the entire feature set, with subsequent reduction in the high data dimensionality.…”
Section: Feature Selection Metricsmentioning
confidence: 99%
“…Some features might provide limited information on the actual contents of malicious applications to the classifier [41,42]. The imperative goals of any malware-detection system include the identification of a subset of features from the entire feature set, with subsequent reduction in the high data dimensionality.…”
Section: Feature Selection Metricsmentioning
confidence: 99%
“…Firstly, we will use ROC, in which x-axis is FPR (defined by Eq (18)) and y-axis is TPR (defined by Eq (19)), and AUC, which denotes the area under ROC curve, to verify the effectiveness of the original DeepDetectNet, the result of which will be used as the baseline for later experiments.…”
Section: Procedures Of Experimentsmentioning
confidence: 99%
“…Santos I et al [13], Gandotra E et al [14], Niu Z et al [15], Wang C et al [16], and Hu X et al [17] extracts the opcode sequence of a PE file with external tools, and then takes the sequence as the input of the deep learning model, allowing the learning model to extract features automatically. Babaagba KO et al [18] concluded the method for feature selection.…”
Section: Introductionmentioning
confidence: 99%
“…The original feature vectors are usually of high dimensions with some useless features, increasing in time cost and decreasing in accuracy [31]- [35]. Therefore, it is necessary to perform certain dimensionality reduction on features.…”
Section: Feature Selection and Similarity Measurementmentioning
confidence: 99%