Improvement of Malware Classification Using Hybrid Feature Engineering

Masabo, Emmanuel; Kaawaase, Kyanda Swaib; Sansa-Otim, Julianne; Ngubiri, John; Hanyurwimfura, Damien

doi:10.1007/s42979-019-0017-9

Cited by 8 publications

(16 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The problem has continued to draw an increasing attention from researchers and practitioners alike [7,8,10,18,[21][22][23][24][25][26][27][28]. Although choosing a subset of features from the original features is a combinatorial problem, many suboptimal heuristics have been put forward and used in various domains, which include the chi-squared based feature subset selection [7,8,10], the analysis of variance (ANOVA) [7,8,10], mutual information [7,23,29] and information gain [18,[25][26][27]. Many studies have shown that feature selection approaches that select good feature subset will have significant impact on reducing the complexity in processing by eliminating unimportant features and enhance the performance of the learning models [24,30].…”

Section: Related Workmentioning

confidence: 99%

“…Generally, static and dynamic analysis methods are utilized to extract typical malware descriptive behaviour (i.e., features) from the raw data. These feature extraction methods normally generate very large high-dimensional, redundant and noisy features [ 10 , 11 ]. Some of the raw features offer little or no information that is useful to distinguish malware apps from benign apps and may even impact the performance of the malware detection methods [ 10 , 12 , 13 , 14 ].…”

Section: Introductionmentioning

confidence: 99%

“…Feature selection algorithms select a subset of features from the original feature set, which are considered useful for training the learning models to obtain good results [ 2 , 10 ]. A growing number of Android malware detection models have applied different feature subset selection algorithms and have achieved good detection rates [ 8 , 16 ].…”

Section: Introductionmentioning

confidence: 99%

“…A growing number of Android malware detection models have applied different feature subset selection algorithms and have achieved good detection rates [ 8 , 16 ]. However, research on the usefulness of the state-of-the-art feature subset selection methods in the context of Android malware detection models have not received the attention it deserves [ 10 ]. To this end, we investigate the utility of the commonly used feature subset selection approaches for malware detection in Android platforms.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Feature Subset Selection for Malware Detection in Smart IoT Platforms

Abawajy

Darem

Alhashmi

2021

Sensors

View full text Add to dashboard Cite

Malicious software (“malware”) has become one of the serious cybersecurity issues in Android ecosystem. Given the fast evolution of Android malware releases, it is practically not feasible to manually detect malware apps in the Android ecosystem. As a result, machine learning has become a fledgling approach for malware detection. Since machine learning performance is largely influenced by the availability of high quality and relevant features, feature selection approaches play key role in machine learning based detection of malware. In this paper, we formulate the feature selection problem as a quadratic programming problem and analyse how commonly used filter-based feature selection methods work with emphases on Android malware detection. We compare and contrast several feature selection methods along several factors including the composition of relevant features selected. We empirically evaluate the predictive accuracy of the feature subset selection algorithms and compare their predictive accuracy and the execution time using several learning algorithms. The results of the experiments confirm that feature selection is necessary for improving accuracy of the learning models as well decreasing the run time. The results also show that the performance of the feature selection algorithms vary from one learning algorithm to another and no one feature selection approach performs better than the other approaches all the time.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Feature Subset Selection for Malware Detection in Smart IoT Platforms

Abawajy

Darem

Alhashmi

2021

Sensors

View full text Add to dashboard Cite

show abstract

“…Initial research studies focused on permission-based detection, signature-based detection, system call-based detection, and sensitive API-based detection. Feature-selection algorithms such as information gain (IG), principal component analysis (PCA), Chi-Square (χ 2 ), and analysis of variance (ANOVA) were suggested to improve the detection performance [23]. Machine-learning techniques have also been applied to automate malware detection strategies [24].…”

Section: Related Workmentioning

confidence: 99%

Automated Malware Detection in Mobile App Stores Based on Robust Feature Generation

Alazab

2020

Electronics

View full text Add to dashboard Cite

Many Internet of Things (IoT) services are currently tracked and regulated via mobile devices, making them vulnerable to privacy attacks and exploitation by various malicious applications. Current solutions are unable to keep pace with the rapid growth of malware and are limited by low detection accuracy, long discovery time, complex implementation, and high computational costs associated with the processor speed, power, and memory. Therefore, an automated intelligence technique is necessary for detecting apps containing malware and effectively predicting cyberattacks in mobile marketplaces. In this study, a system for classifying mobile marketplaces applications using real-world datasets is proposed, which analyzes the source code to identify malicious apps. A rich feature set of application programming interface (API) calls is proposed to capture the regularities in apps containing malicious content. Two feature-selection methods—Chi-Square and ANOVA—were examined in conjunction with ten supervised machine-learning algorithms. The detection accuracy of each classifier was evaluated to identify the most reliable classifier for malware detection using various feature sets. Chi-Square was found to have a higher detection accuracy as compared to ANOVA. The proposed system achieved a detection accuracy of 98.1% with a classification time of 1.22 s. Furthermore, the proposed system required a reduced number of API calls (500 instead of 9000) to be incorporated as features.

show abstract