Given the frequent changes in the Android framework and the continuous evolution of Android malware, it is challenging to detect malware over time in an effective and scalable manner. To address this challenge, we propose DroidEvolver, an Android malware detection system that can automatically and continually update itself during malware detection without any human involvement. While most existing malware detection systems can be updated by retraining on new applications with true labels, DroidEvolver requires neither retraining nor true labels to update itself, mainly due to the insight that DroidEvolver makes necessary and lightweight update using online learning techniques with evolving feature set and pseudo labels. The detection performance of DroidEvolver is evaluated on a dataset of 33,294 benign applications and 34,722 malicious applications developed over a period of six years. Using 6,286 applications dated in 2011 as the initial training set, DroidEvolver achieves high detection F-measure (95.27%), which only declines by 1.06% on average per year over the next five years for classifying 57,539 newly appeared applications. Note that such new applications could use new techniques and new APIs, which are not known to DroidEvolver when initialized with 2011 applications. Compared with the state-of-the-art overtime malware detection system MAMADROID, the F-measure of DroidEvolver is 2.19 times higher on average (10.21 times higher for the fifth year), and the efficiency of DroidEvolver is 28.58 times higher than MAMADROID during malware detection. DroidEvolver is also shown robust against typical code obfuscation techniques.
Android has been the most popular smartphone system, with multiple platform versions (e.g., KITKAT and Lollipop) active in the market. To manage the application's compatibility with one or more platform versions, Android allows apps to declare the supported platform SDK versions in their manifest files. In this paper, we make a first effort to study this modern software mechanism. Our objective is to measure the current practice of the declared SDK versions (which we term as DSDK versions afterwards) in real apps, and the consistency between the DSDK versions and their app API calls. To this end, we perform a threedimensional analysis. First, we parse Android documents to obtain a mapping between each API and their corresponding platform versions. We then analyze the DSDK-API consistency for over 24K apps, among which we pre-exclude 1.3K apps that provide different app binaries for different Android versions through Google Play analysis. Besides shedding light on the current DSDK practice, our study quantitatively measures the two side effects of inappropriate DSDK versions: (i) around 1.8K apps have API calls that do not exist in some declared SDK versions, which causes runtime crash bugs on those platform versions; (ii) over 400 apps, due to claiming the outdated targeted DSDK versions, are potentially exploitable by remote code execution. These results indicate the importance and difficulty of declaring correct DSDK, and our work can help developers fulfill this goal.Recent years have witnessed the extraordinary success of Android, a smartphone operating system owned by Google. At the end of 2013, Android became the most sold phone and tablet OS. As of 2015, Android evolved into the largest installed base of all operating systems. Along with the fast-evolving Android, its fragmentation problem becomes more and more serious. Although new devices ship with the recent Android versions, there are still huge amounts of existing devices running old Android versions [1].To better manage the application's compatibility with multiple platform versions, Android allows apps to declare the supported platform SDK versions in their manifest files. We term these declared SDK versions as DSDK versions.These two author names are in alphabetical order.
A common problem in machine learning-based malware detection is that training data may contain noisy labels and it is challenging to make the training data noise-free at a large scale. To address this problem, we propose a generic framework to reduce the noise level of training data for the training of any machine learning-based Android malware detection. Our framework makes use of all intermediate states of two identical deep learning classification models during their training with a given noisy training dataset and generate a noise-detection feature vector for each input sample. Our framework then applies a set of outlier detection algorithms on all noise-detection feature vectors to reduce the noise level of the given training data before feeding it to any machine learning based Android malware detection approach. In our experiments with three different Android malware detection approaches, our framework can detect significant portions of wrong labels in different training datasets at different noise ratios, and improve the performance of Android malware detection approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.