“…To employ KMeans clustering and PCA [46] for malware detection, a dataset of malware samples must first be preprocessed and feature-engineered to identify essential qualities that discriminate between various forms of malware. The dimensionality of the feature space is decreased by using PCA to isolate a smaller group of orthogonal axes that best capture the range of the data.…”
mentioning
confidence: 99%
“…With KMeans clustering, a dataset X containing n malware samples with m features is divided into k clusters C1, C2, …, Ck. Reducing the total squared distances between every malware sample and its designated centroid is how Equation (7) accomplishes this [46].…”
The Internet of Things (IoT) constitutes the foundation of a deeply interconnected society in which objects communicate through the Internet. This innovation, coupled with 5G and artificial intelligence (AI), finds application in diverse sectors like smart cities and advanced manufacturing. With increasing IoT adoption comes heightened vulnerabilities, prompting research into identifying IoT malware. While existing models excel at spotting known malicious code, detecting new and modified malware presents challenges. This paper presents a novel six-step framework. It begins with eight malware attack datasets as input, followed by insights from Exploratory Data Analysis (EDA). Feature engineering includes scaling, One-Hot Encoding, target variable analysis, feature importance using MDI and XGBoost, and clustering with K-Means and PCA. Our GhostNet ensemble, combined with the Gated Recurrent Unit Ensembler (GNGRUE), is trained on these datasets and fine-tuned using the Jaya Algorithm (JA) to identify and categorize malware. The tuned GNGRUE-JA is tested on malware datasets. A comprehensive comparison with existing models encompasses performance, evaluation criteria, time complexity, and statistical analysis. Our proposed model demonstrates superior performance through extensive simulations, outperforming existing methods by around 15% across metrics like AUC, accuracy, recall, and hamming loss, with a 10% reduction in time complexity. These results emphasize the significance of our study’s outcomes, particularly in achieving cost-effective solutions for detecting eight malware strains.
“…To employ KMeans clustering and PCA [46] for malware detection, a dataset of malware samples must first be preprocessed and feature-engineered to identify essential qualities that discriminate between various forms of malware. The dimensionality of the feature space is decreased by using PCA to isolate a smaller group of orthogonal axes that best capture the range of the data.…”
mentioning
confidence: 99%
“…With KMeans clustering, a dataset X containing n malware samples with m features is divided into k clusters C1, C2, …, Ck. Reducing the total squared distances between every malware sample and its designated centroid is how Equation (7) accomplishes this [46].…”
The Internet of Things (IoT) constitutes the foundation of a deeply interconnected society in which objects communicate through the Internet. This innovation, coupled with 5G and artificial intelligence (AI), finds application in diverse sectors like smart cities and advanced manufacturing. With increasing IoT adoption comes heightened vulnerabilities, prompting research into identifying IoT malware. While existing models excel at spotting known malicious code, detecting new and modified malware presents challenges. This paper presents a novel six-step framework. It begins with eight malware attack datasets as input, followed by insights from Exploratory Data Analysis (EDA). Feature engineering includes scaling, One-Hot Encoding, target variable analysis, feature importance using MDI and XGBoost, and clustering with K-Means and PCA. Our GhostNet ensemble, combined with the Gated Recurrent Unit Ensembler (GNGRUE), is trained on these datasets and fine-tuned using the Jaya Algorithm (JA) to identify and categorize malware. The tuned GNGRUE-JA is tested on malware datasets. A comprehensive comparison with existing models encompasses performance, evaluation criteria, time complexity, and statistical analysis. Our proposed model demonstrates superior performance through extensive simulations, outperforming existing methods by around 15% across metrics like AUC, accuracy, recall, and hamming loss, with a 10% reduction in time complexity. These results emphasize the significance of our study’s outcomes, particularly in achieving cost-effective solutions for detecting eight malware strains.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.