Android application (app) stores contain a huge number of apps, which are manually classified based on the apps’ descriptions into various categories. However, the predefined categories or apps descriptions are usually not very accurate to reflect the real functionalities of apps, thereby leading to misclassify the apps, which may cause serious security issues and unreliability problem in the app store. Therefore, the automatic app classification is an important demand to construct a secure, reliable, integrated, and easy to navigate app store. In this paper, we propose an effective method called AndroClass to automatically classify apps based on their real functionalities by using rich and comprehensive features representing the actual functionalities of the apps. AndroClass performs three steps of feature extraction, feature refinement, and classification. In the feature extraction step, we extract 14 various features for each app by utilizing a unified tool suite. In the feature refinement step, we apply Random Forest algorithm to refine the features. In the classification step, we combine refined features into a single one and AndroClass is equipped with K-Nearest Neighbor, Naive Bayes, Support Vector Machine, and Deep Neural Network to classify apps. On the contrary to the existing methods, all the utilized features in AndroClass are stable and clearly represent the actual functionalities of the app, AndroClass does not pose any issues to the user privacy, and our method can be applied to classify unreleased or newly released apps. The results of extensive experiments with two real-world datasets and a dataset constructed by human experts demonstrate the effectiveness of AndroClass where the classification accuracy of AndroClass with the latter dataset is 83.5%.
Masoud REYHANI HAMEDANI†a) , Nonmember and Sang-Wook KIM †b) , Member
SUMMARYIn this paper, we propose SimCS (similarity based on contribution scores) to compute the similarity of scientific papers. For similarity computation, we exploit a notion of a contribution score that indicates how much a paper contributes to another paper citing it. Also, we consider the author dominance of papers in computing contribution scores. We perform extensive experiments with a real-world dataset to show the superiority of SimCS. In comparison with SimCC, the-state-of-the-art method, SimCS not only requires no extra parameter tuning but also shows higher accuracy in similarity computation.
Despite the fact that SimRank has been successfully applied to various applications as a link-based similarity measure, it suffers from a counter-intuitive property called a pairwise normalization problem; JacSim is a powerful variant of SimRank that effectively solves this problem. In this paper, we first point out three existing drawbacks of JacSim and then propose JacSim* to effectively solve them; JacSim* exploits those paths neglected by JacSim in similarity computation, its matrix form provides the exact similarity scores while not being sensitive to the number of node-pairs with common neighbors, and it has simpler, easier to understand, and easier to implement formulas in both iterative and matrix forms than those of JacSim. We conduct extensive experiments with eight real-world datasets to evaluate both the accuracy and performance of JacSim* in comparison with those of JacSim. Our experimental results demonstrate that JacSim* shows better accuracy than JacSim and the JacSim* matrix form is dramatically faster than its iterative form and also than the two forms of JacSim with all datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.