Fast Preprocessing by Suffix Arrays for Managing Byte n-grams to Detect Malware Subspecies by Machine Learning
Kouhei Kita,
Ryuya Uda
Abstract:Although machine learning methods with byte n-grams have been marking high score for classifying malware and benignware, they seem not to be used for current anti-virus software. A performance bottleneck of the methods is dealing with byte n-grams in preprocessing such as top-k selection. It takes a long time to extract all byte n-grams which are required for selecting top-k n-grams. Moreover, if several "n"s are wanted to be used such as 4grams, 8-grams and 16-grams, n-grams with each "n" must be extracted ag… Show more
Set email alert for when this publication receives citations?
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.