Abstract:In order to classify a web page as being benign or malicious, we designed 14 basic and 16 extended features. The basic features that we implemented were selected to represent the essential characteristics of a web page. The system heuristically combines two basic features into one extended feature in order to effectively distinguish benign and malicious pages. The support vector machine can be trained to successfully classify pages by using these features. Because more and more malicious web pages are appearin… Show more
“…Finally Hwang, et al [9] suggested a method for classifying the malicious web pages by an adaptive support vector machine. To classify the malicious web pages, they defined the features to represent the essential characteristics of a web page and selected an adaptive support vector machine (aSVM) for learning training data.…”
Section: Related Workmentioning
confidence: 99%
“…Firstly, dynamic analysis by the used features utilizes information such as the frequency or sequence of API call [1], [3]- [5], compiled hexadecimal code [2], program execution paths [8] and others [5]- [7] as the feature. Secondly, analysis by applied techniques utilizes a sequence alignment [1], [2] and data mining or machine learning [2]- [5], [9] for the collected feature data.…”
Section: Related Workmentioning
confidence: 99%
“…as well as API calls [1], [3]- [5] and control flows. Although the algorithms or methods of various fields have been applied for malware detection, such as the machine learning, data mining and various algorithms [1]- [5], [9], we introduce a dynamic analysis method based on malware behavior information with API call sequences and Multiple Sequence Alignment (MSA) for malware detection.…”
SUMMARYThe recent cyber-attacks utilize various malware as a means of attacks for the attacker's malicious purposes. They are aimed to steal confidential information or seize control over major facilities after infiltrating the network of a target organization. Attackers generally create new malware or many different types of malware by using an automatic malware creation tool which enables remote control over a target system easily and disturbs trace-back of these attacks. The paper proposes a generation method of malware behavior patterns as well as the detection techniques in order to detect the known and even unknown malware efficiently. The behavior patterns of malware are generated with Multiple Sequence Alignment (MSA) of API call sequences of malware. Consequently, we defined these behavior patterns as a "feature-chain" of malware for the analytical purpose. The initial generation of the feature-chain consists of extracting API call sequences with API hooking library, classifying malware samples by the similar behavior, and making the representative sequences from the MSA results. The detection mechanism of numerous malware is performed by measuring similarity between API call sequence of a target process (suspicious executables) and feature-chain of malware. By comparing with other existing methods, we proved the effectiveness of our proposed method based on Longest Common Subsequence (LCS) algorithm. Also we evaluated that our method outperforms other antivirus systems with 2.55 times in detection rate and 1.33 times in accuracy rate for malware detection.
“…Finally Hwang, et al [9] suggested a method for classifying the malicious web pages by an adaptive support vector machine. To classify the malicious web pages, they defined the features to represent the essential characteristics of a web page and selected an adaptive support vector machine (aSVM) for learning training data.…”
Section: Related Workmentioning
confidence: 99%
“…Firstly, dynamic analysis by the used features utilizes information such as the frequency or sequence of API call [1], [3]- [5], compiled hexadecimal code [2], program execution paths [8] and others [5]- [7] as the feature. Secondly, analysis by applied techniques utilizes a sequence alignment [1], [2] and data mining or machine learning [2]- [5], [9] for the collected feature data.…”
Section: Related Workmentioning
confidence: 99%
“…as well as API calls [1], [3]- [5] and control flows. Although the algorithms or methods of various fields have been applied for malware detection, such as the machine learning, data mining and various algorithms [1]- [5], [9], we introduce a dynamic analysis method based on malware behavior information with API call sequences and Multiple Sequence Alignment (MSA) for malware detection.…”
SUMMARYThe recent cyber-attacks utilize various malware as a means of attacks for the attacker's malicious purposes. They are aimed to steal confidential information or seize control over major facilities after infiltrating the network of a target organization. Attackers generally create new malware or many different types of malware by using an automatic malware creation tool which enables remote control over a target system easily and disturbs trace-back of these attacks. The paper proposes a generation method of malware behavior patterns as well as the detection techniques in order to detect the known and even unknown malware efficiently. The behavior patterns of malware are generated with Multiple Sequence Alignment (MSA) of API call sequences of malware. Consequently, we defined these behavior patterns as a "feature-chain" of malware for the analytical purpose. The initial generation of the feature-chain consists of extracting API call sequences with API hooking library, classifying malware samples by the similar behavior, and making the representative sequences from the MSA results. The detection mechanism of numerous malware is performed by measuring similarity between API call sequence of a target process (suspicious executables) and feature-chain of malware. By comparing with other existing methods, we proved the effectiveness of our proposed method based on Longest Common Subsequence (LCS) algorithm. Also we evaluated that our method outperforms other antivirus systems with 2.55 times in detection rate and 1.33 times in accuracy rate for malware detection.
“…These HTML tags consist of <link>, <object>, <form>, <script>, <embed>, <ilayer>, <layer>, <style>, <applet>, <meta>, <img>, <iframe>, and many more. For instance, the XSS worm "Samy" infected MySpace webpages by injecting a huge quantity of XSS payload in the <div> tag of the webpages [20]. On the other hand, JavaScript language is used in a webpage for embedding tasks, but an attacker can misuse some methods on the embedded XSS payload such as exec(), fromCharCode(), eval, alert(), getElementsByTagName(), write(), unscape(), and escape() [21].…”
Social networking services (SNSs) such as Twitter, MySpace, and Facebook have become progressively significant with its billions of users. Still, alongside this increase is an increase in security threats such as crosssite scripting (XSS) threat. Recently, a few approaches have been proposed to detect an XSS attack on SNSs. Due to the certain recent features of SNSs webpages such as JavaScript and AJAX, however, the existing approaches are not efficient in combating XSS attack on SNSs. In this paper, we propose a machine learningbased approach to detecting XSS attack on SNSs. In our approach, the detection of XSS attack is performed based on three features: URLs, webpage, and SNSs. A dataset is prepared by collecting 1,000 SNSs webpages and extracting the features from these webpages. Ten different machine learning classifiers are used on a prepared dataset to classify webpages into two categories: XSS or non-XSS. To validate the efficiency of the proposed approach, we evaluated and compared it with other existing approaches. The evaluation results show that our approach attains better performance in the SNS environment, recording the highest accuracy of 0.972 and lowest false positive rate of 0.87.
“…Xu et al [15] presented a web page classification algorithm-Link Information Categorization (LIC)-to solve the traditional classification algorithms based on the analysis of web content that cannot implement effective classification. In order to classify a web page as being benign or malicious, Hwang et al [16] designed the system of 14 basic and 16 extended features that heuristically combined two basic features into one extended feature in order to effectively distinguish benign and malicious pages.…”
Abstract:Considering the explosive growth of data, the increased amount of text data's effect on the performance of text categorization forward the need for higher requirements, such that the existing classification method cannot be satisfied. Based on the study of existing text classification technology and semantics, this paper puts forward a kind of Chinese text classification oriented SAW (Structural Auxiliary Word) algorithm. The algorithm uses the special space effect of Chinese text where words have an implied correlation between text information mining and text categorization for high-correlation matching. Experiments show that SAW classification algorithm on the premise of ensuring precision in classification, significantly improve the classification precision and recall, obviously improving the performance of information retrieval, and providing an effective means of data use in the era of big data information extraction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.