Abstract:The booming popularity of smartphones is partly a result of application markets where users can easily download wide range of third-party applications. However, due to the open nature of markets, especially on Android, there have been several privacy and security concerns with these applications. On Google Play, as with most other markets, users have direct access to natural-language descriptions of those applications, which give an intuitive idea of the functionality including the security-related information… Show more
“…In a combination of static code inspection and text analysis, Watanabe et al [14] present a keyword-based technique to correlate access to privacy-relevant resources with app descriptions. AutoCog [10] correlates permission-related API calls with frequently occurring text fragments. As a result, semantic patterns are derived that can provide an insight into why Android apps request certain permissions.…”
Static and dynamic program analysis are the key concepts researchers apply to uncover security-critical implementation weaknesses in Android applications. As it is often not obvious in which context problematic statements occur, it is challenging to assess their practical impact. While some flaws may turn out to be bad practice but not undermine the overall security level, others could have a serious impact. Distinguishing them requires knowledge of the designated app purpose. In this paper, we introduce a machine learning-based system that is capable of generating natural language text describing the purpose and core functionality of Android apps based on their actual code. We design a dense neural network that captures the semantic relationships of resource identifiers, string constants, and API calls contained in apps to derive a high-level picture of implemented program behavior. For arbitrary applications, our system can predict precise, human-readable keywords and short phrases that indicate the main use-cases apps are designed for. We evaluate our solution on 67,040 real-world apps and find that with a precision between 69% and 84% we can identify keywords that also occur in the developer-provided description in Google Play. To avoid incomprehensible black box predictions, we apply a model explaining algorithm and demonstrate that our technique can substantially augment inspections of Android apps by contributing contextual information.
“…In a combination of static code inspection and text analysis, Watanabe et al [14] present a keyword-based technique to correlate access to privacy-relevant resources with app descriptions. AutoCog [10] correlates permission-related API calls with frequently occurring text fragments. As a result, semantic patterns are derived that can provide an insight into why Android apps request certain permissions.…”
Static and dynamic program analysis are the key concepts researchers apply to uncover security-critical implementation weaknesses in Android applications. As it is often not obvious in which context problematic statements occur, it is challenging to assess their practical impact. While some flaws may turn out to be bad practice but not undermine the overall security level, others could have a serious impact. Distinguishing them requires knowledge of the designated app purpose. In this paper, we introduce a machine learning-based system that is capable of generating natural language text describing the purpose and core functionality of Android apps based on their actual code. We design a dense neural network that captures the semantic relationships of resource identifiers, string constants, and API calls contained in apps to derive a high-level picture of implemented program behavior. For arbitrary applications, our system can predict precise, human-readable keywords and short phrases that indicate the main use-cases apps are designed for. We evaluate our solution on 67,040 real-world apps and find that with a precision between 69% and 84% we can identify keywords that also occur in the developer-provided description in Google Play. To avoid incomprehensible black box predictions, we apply a model explaining algorithm and demonstrate that our technique can substantially augment inspections of Android apps by contributing contextual information.
“…Another model, AutoCog proposed by Qu et al [3] is based on explicit semantic analysis (ESA) which provides a vectoral representation of a given text, thereby giving a representation of the meaning in the text. The model uses vectoral representations of app descriptions to assess their semantic relatedness to permissions.…”
Section: Related Workmentioning
confidence: 99%
“…If the selected sentence is related to the relevant permission group, it is labeled as 1; if not, it is labeled as 0. Unlike previous studies in the literature [3], [6], AC-Net labelled application descriptions according to permission groups rather than single permission. 9 of these permission groups are belong to dangerous permissions.…”
Section: Modelmentioning
confidence: 99%
“…In this study, a new approach based on NLP and recurrent neural networks (RNNs) is employed in order to fnd inconsistencies between the requested permissions of an application and its metadata. This problem is known as description-tofdelity problem [3] in the literature. Permissions [4] is one of the important security mechanisms introduced by Android, so users have to grant dangerous permissions before their usage.…”
Section: Introduction Mobile Devices Have Become Inevitable Part Of Our Livesmentioning
confidence: 99%
“…We use Porter stemmer[17] 2. https://dynet.readthedocs.io/en/latest/tutorial.html3 The implementation will be publicly available if the paper gets accepted.…”
Android gives us opportunity to extract meaningful information from metadata. From the security point of view, the missing important information in metadata of an application could be a sign of suspicious application, which could be directed for extensive analysis. Especially the usage of dangerous permissions is expected to be explained in app descriptions. The permission-to-description fdelity problem in the literature aims to discover such inconsistencies between the usage of permissions and descriptions. This study proposes a new method based on natural language processing and recurrent neural networks. The effect of user reviews on fnding such inconsistencies is also investigated in addition to application descriptions. The experimental results show that high precision is obtained by the proposed solution, and the proposed method could be used for triage of Android applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.