Dissemination of malicious code, also known as malware, poses severe challenges to cyber security. Malware authors embed software in seemingly innocuous executables, unknown to a user. The malware subsequently interacts with security-critical OS resources on the host system or network, in order to destroy their information or to gather sensitive information such as passwords and credit card numbers. Malware authors typically use Application Programming Interface (API) calls to perpetrate these crimes. We present a model that uses text mining and topic modeling to detect malware, based on the types of API call sequences. We evaluated our technique on two publicly available datasets . We observed that Decision Tree and S upport Vector Machine yielded significant results. We performed t-test with respect to sensitivity for the two models and found that statistically there is no significant difference between these models. We recommend Decision Tree as it yields 'if-then' rules, which could be used as an early warning expert system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.