Malware detection via API calls, topic models and machine learning

Sundarkumar, G. Ganesh; Ravi, Vadlamani; Nwogu, Ifeoma; Govindaraju, Venu

doi:10.1109/coase.2015.7294263

Cited by 33 publications

(11 citation statements)

References 23 publications

(27 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…e work by Alazab et al [5] studied an automated method of extracting API call features and analysed them to understand their use for malicious purpose. Sundarkumar et al [6] presented a model, based on the types of API call sequences, using text mining and topic modeling to detect malware. Hachinyan [7] discussed proactive methods based on API call sequences analysis and proposed a method using a multiple sequence alignment to identify malware.…”

Section: Related Workmentioning

confidence: 99%

Malware Detection Using CNN via Word Embedding in Cloud Computing Infrastructure

Wang

Tian

Lin

2021

Scientific Programming

View full text Add to dashboard Cite

The Internet of Things (IoT), cloud, and fog computing paradigms provide a powerful large-scale computing infrastructure for a variety of data and computation-intensive applications. These cutting-edge computing infrastructures, however, are nevertheless vulnerable to serious security and privacy risks. One of the most important countermeasures against cybersecurity threats is intrusion detection and prevention systems, which monitor devices, networks, and systems for malicious activity and policy violations. The detection and prevention systems range from antivirus software to hierarchical systems that monitor the traffic of whole backbone networks. At the moment, the primary defensive solutions are based on malware feature extraction. Most known feature extraction algorithms use byte N-gram patterns or binary strings to represent log files or other static information. The information taken from program files is expressed using word embedding (GloVe) and a new feature extraction method proposed in this article. As a result, the relevant vector space model (VSM) will incorporate more information about unknown programs. We utilize convolutional neural network (CNN) to analyze the feature maps represented by word embedding and apply Softmax to fit the probability of a malicious program. Eventually, we consider a program to be malicious if the probability is greater than 0.5; otherwise, it is a benign program. Experimental result shows that our approach achieves a level of accuracy higher than 98%.

show abstract

Section: Related Workmentioning

confidence: 99%

Malware Detection Using CNN via Word Embedding in Cloud Computing Infrastructure

Wang

Tian

Lin

2021

Scientific Programming

View full text Add to dashboard Cite

show abstract

“…Where: TF = Term frequency IDF = Inverse document frequency As mentioned, TF-IDF is popular among the researchers when doing API call analysis. Among the researchers that use TF-IDF in their research is Sundarkumar et al (2015), Pektas and Acarman (2017), (Altawaier and Tiun, 2016) and (Bai et al, 2014).…”

Section: Term Frequencymentioning

confidence: 99%

“…They also use a soft clustering algorithm which is a non-Negative Matrix Factorization (NMF) to extract the API call topics which will be used to detect similar but unknown malware. Sundarkumar et al (2015) wrote that API level information inside the bytecode is beneficial to analyze software malevolence tendency since it shows the behavior of said executable which the API call sequence of. They also assert that the main problem in using Topic Model is the lots of choices in features, hence why, they propose to apply Latent Dirichl et al location (LDA) as a feature selection method in their research.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Concordance and Term Frequency in Analyzing API Calls for Malware Behavior Detection

Wahab

Mohd

Muniyandi

et al. 2019

Journal of Computer Science

View full text Add to dashboard Cite

Application Programming Interface (API) is used for the software to interact with an operating system to do certain task such as opening file, deleting file and many more. Programmers use this API to make it easier for their program to communicate with the operating system without having the knowledge of the hardware of the target system. Malware author is an attacker that may belong to an organization or work for themselves. Some malware author has the capabilities to write their own malware, uses the same kind of APIs that is used to create normal programs to create malware. There are many researches done in this field, however, most researchers used n-gram to detect the sequence of API calls and although it gave good results, it is time consuming to process through all the output. This is the reason why this paper proposed to use Concordance to search for the API call sequence of a malware because it uses KWIC (Key Word in Context), thus only displayed the output based on the queried keyword. After that, Term Frequency (TF) is used to search for the most commonly used APIs in the dataset. The results of the experiment show that concordance can be used to search for API call sequence as we manage to identify six malicious behaviors (Install Itself at Startup, Enumerate All Process, Privilege Escalation, Terminate Process, Process Hollowing and Ant debugging) using this method. And based on the TF score, the most commonly used API in the dataset is the Reg Close Key (TF: 1.388), which on its own is not a dangerous API, hence we can infer that most API is not malicious in nature, it is how they were implemented is making them dangerous.

show abstract

“…Sundarkumar et al [16] tried to use API information to characterize Android malware. They use text mining and topic modelling, combined with machine learning classifier, to detect malwares.…”

Section: Machine Learning Modelmentioning

confidence: 99%

DMIA : A Malware Detection System on IOS Platform

Liu¹,

Xie²,

Song³

2016

Computer Science &Amp; Information Technology ( CS &Amp; IT )

View full text Add to dashboard Cite

iOS is a popular operating system on Apple's smartphones, and recent security events have shown the possibility of stealing the users' privacy in iOS without being detected, such as XcodeGhost. So, we present the design and implementation of a malware vetting system, called DMIA. DMIA first collects runtime information of an app and then distinguish between malicious and normal apps by a novel machine learning model. We evaluated DMIA with 1000 apps from the official App Store. The results of experiments show that DMIA is effective in detecting malwares aimed to steal privacy.

show abstract

Malware detection via API calls, topic models and machine learning

Cited by 33 publications

References 23 publications

Malware Detection Using CNN via Word Embedding in Cloud Computing Infrastructure

Malware Detection Using CNN via Word Embedding in Cloud Computing Infrastructure

Concordance and Term Frequency in Analyzing API Calls for Malware Behavior Detection

DMIA : A Malware Detection System on IOS Platform

Contact Info

Product

Resources

About