Consideration for Creation of Training Samples for Targeted Malware Detection by Machine Learning

Uda, Ryuya; Kotani, Taeko

doi:10.1109/icict58900.2023.00031

2023 6th International Conference on Information and Computer Technologies (ICICT) 2023

DOI: 10.1109/icict58900.2023.00031

|View full text |Cite

Consideration for Creation of Training Samples for Targeted Malware Detection by Machine Learning

Ryuya Uda,

Taeko Kotani

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article1

Relationship

Self Cite1

Independent0

Authors

Journals

Cited by 1 publication

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Fast Preprocessing by Suffix Arrays for Managing Byte n-grams to Detect Malware Subspecies by Machine Learning

Kita,

Uda

2024

Journal of Information Processing

Self Cite

View full text Add to dashboard Cite

Although machine learning methods with byte n-grams have been marking high score for classifying malware and benignware, they seem not to be used for current anti-virus software. A performance bottleneck of the methods is dealing with byte n-grams in preprocessing such as top-k selection. It takes a long time to extract all byte n-grams which are required for selecting top-k n-grams. Moreover, if several "n"s are wanted to be used such as 4grams, 8-grams and 16-grams, n-grams with each "n" must be extracted again and again. Therefore, we proposed a fast preprocessing method of extracting n-grams by applying a suffix array algorithm. Furthermore, our method can manage multi-length byte n-grams at the same time. In addition, selecting feature n-grams like top-k n-grams with information gain is also included in our method. On the other hand, our method has a limitation that it is only applicable to a large number of samples in the same malware subspecies family, which become extinct. We evaluated the speed of our method by comparing with usual ways. We also evaluated our method by machine learning with actual samples in four old malware subspecies families. We think there is a hope that our method may be applicable to detecting current targeted malware.

show abstract

Fast Preprocessing by Suffix Arrays for Managing Byte n-grams to Detect Malware Subspecies by Machine Learning

Kita,

Uda

2024

Journal of Information Processing

Self Cite

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Consideration for Creation of Training Samples for Targeted Malware Detection by Machine Learning

Cited by 1 publication

References 6 publications

Fast Preprocessing by Suffix Arrays for Managing Byte n-grams to Detect Malware Subspecies by Machine Learning

Fast Preprocessing by Suffix Arrays for Managing Byte n-grams to Detect Malware Subspecies by Machine Learning

Contact Info

Product

Resources

About