<p class="0abstract">The open source nature of Android Operating System has attracted wider adoption of the system by multiple types of developers. This phenomenon has further fostered an exponential proliferation of devices running the Android OS into different sectors of the economy. Although this development has brought about great technological advancements and ease of doing businesses (e-commerce) and social interactions, they have however become strong mediums for the uncontrolled rising cyberattacks and espionage against business infrastructures and the individual users of these mobile devices. Different cyberattacks techniques exist but attacks through malicious applications have taken the lead aside other attack methods like social engineering. Android malware have evolved in sophistications and intelligence that they have become highly resistant to existing detection systems especially those that are signature-based. Machine learning techniques have risen to become a more competent choice for combating the kind of sophistications and novelty deployed by emerging Android malwares. The models created via machine learning methods work by first learning the existing patterns of malware behaviour and then use this knowledge to separate or identify any such similar behaviour from unknown attacks. This paper provided a comprehensive review of machine learning techniques and their applications in Android malware detection as found in contemporary literature.</p>
<p>Android operating system has become very popular, with the highest market share, amongst all other mobile operating systems due to its open source nature and users friendliness. This has brought about an uncontrolled rise in malicious applications targeting the Android platform. Emerging trends of Android malware are employing highly sophisticated detection and analysis avoidance techniques such that the traditional signature-based detection methods have become less potent in their ability to detect new and unknown malware. Alternative approaches, such as the Machine learning techniques have taken the lead for timely zero-day anomaly detections. The study aimed at developing an optimized Android malware detection model using ensemble learning technique. Random Forest, Support Vector Machine, and k-Nearest Neighbours were used to develop three distinct base models and their predictive results were further combined using Majority Vote combination function to produce an ensemble model. Reverse engineering procedure was employed to extract static features from large repository of malware samples and benign applications. WEKA 3.8.2 data mining suite was used to perform all the learning experiments. The results showed that Random Forest had a true positive rate of 97.9%, a false positive rate of 1.9% and was able to correctly classify instances with 98%, making it a strong base model. The ensemble model had a true positive rate of 98.1%, false positive rate of 1.8% and was able to correctly classify instances with 98.16%. The finding shows that, although the base learners had good detection results, the ensemble learner produced a better optimized detection model compared with the performances of those of the base learners.</p>
The high proliferation rate of Android devices has exposed the platform to wider vulnerabilities of increasing malware attacks. Emerging trends of the malware threats are employing highly sophisticated and dynamic detection avoidance techniques. This has continued to weaken the capacity of existing signature-based detection systems in their protection against new and unknown threats. Thus, the need for effective detection approaches for unknown and novel Android malware has remained a growing challenge in the field of mobile and information security. This study therefore aimed at investigating the best performing machine learning classification algorithm for the anomaly Android malware detection, leveraging on permission-based feature sets, by conduction a performance comparison analysis between six different classification algorithms namely: Naïve Bayes, Simple Logistics, Random Forest, PART, k-Nearest Neighbours (k-NN), and Support Vector Machine (SVM). The Machine learning tool that was used for the preprocessing of the feature sets and the classification processes is WEKA 3.8.2 suite. Findings of the study showed that Random Forest had the best detection result with false alarm rate of 2.2%, accuracy of 97.4%, error rate of 2.6% and ROC Area of 99.6%. The study concluded that, using Android permission features, Random Forest and k-Nearest Neighbours recoded best performances in Android malware detection, followed by Support Vector Machine and Simple Logistics classification algorithms. Partial Decision Tree (PART) performed relatively well, while Naïve Baye recorded the least performance. Consequently, the deployment of Random Forest model and k-NN model are recommended for the development of an anomaly Android malware detection paradigm.
The key component that makes the detection of android malware possible is the availability of the right triggers and pointers, which are found in the Android application packages, known as features or attributes. These are fundamental in the training of the different machine learning algorithms to produce the required detection model. The process of extracting these attributes or features, from the Android application packages, is known as reverse engineering. This paper delved into the experimental detail processes of applying reverse engineering procedure, using Sublime Text 2 and Androguard Plugin, on Android Application packages for the extraction of, particularly permissions, which are the targeted features. The study further discussed the cleaning stages, using NotePad++, Microsoft Excel Worksheet, and MS Word, to sort out all the relevant and important features by removing all the noisy ones. A total of 1500 Android apps were downloaded from both benign and malicious sources and used for the experiment. The cleaned or important features extracted from these application packages at the end of the reverse engineering processes are 162 in total and these were further used to form a feature binary matrix of size 1500 by 163 (including the class features).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.