N-Grams-Based File Signatures for Malware Detection

Santos, Igor; Penya, Yoseba K.; Devesa, Jaime; Bringas, Pablo García

doi:10.5220/0001863603170320

Cited by 137 publications

(94 citation statements)

References 7 publications

(6 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Already in 1994, Kephart at IBM has proposed to use N-grams for malware analysis (Kephart 1994). More recently a large body of research in malware detection based on machine learning have opted for n-grams to generate file/program signatures for the training dataset of malware (Henchiri and Japkowicz 2006;Kolter and Maloof 2006;Santos et al 2009). Despite the high performance claimed by the authors for very small datasets, between 500 and 3 000 software programs, we believe that a malware detector based on n-grams, because of its vulnerability to obfuscation, could be trivially defeated by malware authors.…”

Section: Methodsmentioning

confidence: 93%

Empirical assessment of machine learning-based malware detectors for Android

et al. 2014

View full text Add to dashboard Cite

To address the issue of malware detection through large sets of applications, researchers have recently started to investigate the capabilities of machine-learning techniques for proposing effective approaches. So far, several promising results were recorded in the literature, many approaches being assessed with what we call in the lab validation scenarios. This paper revisits the purpose of malware detection to discuss whether such in the lab validation scenarios provide reliable indications on the performance of malware detectors in real-world settings, aka in the wild.To this end, we have devised several Machine Learning classifiers that rely on a set of features built from applications' CFGs. We use a sizeable dataset of over 50 000 Android applications collected from sources where state-of-the art approaches have selected their data. We show that, in the lab, our approach outperforms existing machine learning-based approaches. However, this high performance does not translate in high performance in the wild. The performance gap we observed-F-measures dropping from over 0.9 in the lab to below 0.1 in the wild -raises one important question: How do state-of-the-art approaches perform in the wild ?

show abstract

Section: Methodsmentioning

confidence: 93%

Empirical assessment of machine learning-based malware detectors for Android

et al. 2014

View full text Add to dashboard Cite

show abstract

“…Google reacted in setting up Google Bouncer, which scans applications before submission. However, according to a Karspersky security bulletin 4 , no significant change has been observed. Researchers [1] have found ways to bypass the Google Bouncer in fingerprinting the Android emulator used by the service.…”

Section: Introductionmentioning

confidence: 98%

“…Known feature sets that have already been used in the past to detect malicious programs: n-grams [4], opcodes [5], Android permissions combined with Control Flow Graphs [6] and several others. Finding the feature set that generalizes the most our observable is the most challenging task.…”

Section: A Feature Extractionmentioning

confidence: 99%

Using opcode-sequences to detect malicious Android applications

Jérôme

Allix

State

et al. 2014

2014 IEEE International Conference on Communications (ICC)

View full text Add to dashboard Cite

Abstract-Recently, the Android platform has seen its number of malicious applications increased sharply. Motivated by the easy application submission process and the number of alternative market places for distributing Android applications, rogue authors are developing constantly new malicious programs. While current anti-virus software mainly relies on signature detection, the issue of alternative malware detection has to be addressed. In this paper, we present a feature based detection mechanism relying on opcode-sequences combined with machine learning techniques. We assess our tool on both a reference dataset known as Genome Project as well as on a wider sample of 40,000 applications retrieved from the Google Play Store.

show abstract

“…Santos et al [35] propose opcode ngrams to detect malware using a dataset composed by 1000 malware and 1000 trusted computer applications. They conclude that using 2gram the detection ratio is quite low, achieving a maximum value of 69.66%, thus 2-grams do not seem to be appropriate for malware detection.…”

Section: Related Workmentioning

confidence: 99%

Effectiveness of Opcode ngrams for Detection of Multi Family Android Malware

Canfora

Lorenzo

Medvet

et al. 2015

2015 10th International Conference on Availability, Reliability and Security

View full text Add to dashboard Cite

Abstract-With the wide diffusion of smartphones and their usage in a plethora of processes and activities, these devices have been handling an increasing variety of sensitive resources. Attackers are hence producing a large number of malware applications for Android (the most spread mobile platform), often by slightly modifying existing applications, which results in malware being organized in families.Some works in the literature showed that opcodes are informative for detecting malware, not only in the Android platform. In this paper, we investigate if frequencies of ngrams of opcodes are effective in detecting Android malware and if there is some significant malware family for which they are more or less effective. To this end, we designed a method based on state-of-the-art classifiers applied to frequencies of opcodes ngrams. Then, we experimentally evaluated it on a recent dataset composed of 11120 applications, 5560 of which are malware belonging to several different families.Results show that an accuracy of 97% can be obtained on the average, whereas perfect detection rate is achieved for more than one malware family.

show abstract

N-Grams-Based File Signatures for Malware Detection

Cited by 137 publications

References 7 publications

Empirical assessment of machine learning-based malware detectors for Android

Empirical assessment of machine learning-based malware detectors for Android

Using opcode-sequences to detect malicious Android applications

Effectiveness of Opcode ngrams for Detection of Multi Family Android Malware

Contact Info

Product

Resources

About