Malware is becoming increasingly sophisticated and difficult to detect with traditional monitoring tools and antivirus software. As a result, machine learning has become a popular approach for classifying and detecting malware-related data. In this study, two distinct datasets, Malware-Exploratory and CIC-MalMem-2022, were subjected to a series of supervised and unsupervised learning procedures to gather information for observation. As this is an extension of a previous research, the developed model is enhanced to include feature selection using Pearson correlation coefficient and genetic algorithm. It is then tested against a created dataset SMITH and a GAN dataset produced from SMITH, along with the datasets Malware-Exploratory and CIC-MalMem-2022 from the previous work. The model still uses the three clustering algorithms for analysis, namely K-Means, Density-Based Spatial Clustering of Applications with Noise, and Gaussian Mixture Model, and seven classification algorithms for predicting malware, namely Decision Tree, Random Forest, Ada Boost, KNeighbors, Stochastic Gradient Descent, Extra Trees, and Gaussian Naïve Bayes. Previous results showed that the Malware-Exploratory raw dataset achieved an accuracy score of 90%, while the CIC-MalMem-2022 raw dataset achieved a score of 99%. The results from this research show that the genetic algorithm emerges as the best method for detecting malware in the Malware-Exploratory and CIC-MalMem-2022 datasets, while the Pearson correlation coefficient performs well against the SMITH dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.