Proceedings of the 11th International Conference on Enterprise Information 2009
DOI: 10.5220/0001863603170320
|View full text |Cite
|
Sign up to set email alerts
|

N-Grams-Based File Signatures for Malware Detection

Abstract: Malware is any malicious code that has the potential to harm any computer or network. The amount of malware is increasing faster every year and poses a serious security threat. Thus, malware detection is a critical topic in computer security. Currently, signature-based detection is the most extended method for detecting malware. Although this method is still used on most popular commercial computer antivirus software, it can only achieve detection once the virus has already caused damage and it is registered. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
90
0

Year Published

2011
2011
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 137 publications
(94 citation statements)
references
References 7 publications
(6 reference statements)
0
90
0
Order By: Relevance
“…Already in 1994, Kephart at IBM has proposed to use N-grams for malware analysis (Kephart 1994). More recently a large body of research in malware detection based on machine learning have opted for n-grams to generate file/program signatures for the training dataset of malware (Henchiri and Japkowicz 2006;Kolter and Maloof 2006;Santos et al 2009). Despite the high performance claimed by the authors for very small datasets, between 500 and 3 000 software programs, we believe that a malware detector based on n-grams, because of its vulnerability to obfuscation, could be trivially defeated by malware authors.…”
Section: Methodsmentioning
confidence: 93%
“…Already in 1994, Kephart at IBM has proposed to use N-grams for malware analysis (Kephart 1994). More recently a large body of research in malware detection based on machine learning have opted for n-grams to generate file/program signatures for the training dataset of malware (Henchiri and Japkowicz 2006;Kolter and Maloof 2006;Santos et al 2009). Despite the high performance claimed by the authors for very small datasets, between 500 and 3 000 software programs, we believe that a malware detector based on n-grams, because of its vulnerability to obfuscation, could be trivially defeated by malware authors.…”
Section: Methodsmentioning
confidence: 93%
“…Google reacted in setting up Google Bouncer, which scans applications before submission. However, according to a Karspersky security bulletin 4 , no significant change has been observed. Researchers [1] have found ways to bypass the Google Bouncer in fingerprinting the Android emulator used by the service.…”
Section: Introductionmentioning
confidence: 98%
“…Known feature sets that have already been used in the past to detect malicious programs: n-grams [4], opcodes [5], Android permissions combined with Control Flow Graphs [6] and several others. Finding the feature set that generalizes the most our observable is the most challenging task.…”
Section: A Feature Extractionmentioning
confidence: 99%
“…Santos et al [35] propose opcode ngrams to detect malware using a dataset composed by 1000 malware and 1000 trusted computer applications. They conclude that using 2gram the detection ratio is quite low, achieving a maximum value of 69.66%, thus 2-grams do not seem to be appropriate for malware detection.…”
Section: Related Workmentioning
confidence: 99%