2006
DOI: 10.1007/s11416-006-0030-0
|View full text |Cite
|
Sign up to set email alerts
|

Language models for detection of unknown attacks in network traffic

Abstract: In this paper we propose a method for network intrusion detection based on language models. Our method proceeds by extracting language features such as n-grams and words from connection payloads and applying unsupervised anomaly detection without prior learning phase or presence of labeled data. The essential part of this procedure is linear-time computation of similarity measures between language models of connection payloads. Particular patterns in these models decisive for discrimination of attacks and norm… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
36
0

Year Published

2008
2008
2020
2020

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 82 publications
(36 citation statements)
references
References 49 publications
0
36
0
Order By: Relevance
“…They have also been used for traffic analysis problems with success [21]. This technique combines the hostnames of n adjacent events to obtain a composite hostname.…”
Section: Transformation Of Frequency Vectorsmentioning
confidence: 99%
“…They have also been used for traffic analysis problems with success [21]. This technique combines the hostnames of n adjacent events to obtain a composite hostname.…”
Section: Transformation Of Frequency Vectorsmentioning
confidence: 99%
“…Considering the global variety of development platforms and the mobility of threats facilitated by the Internet, ensuring the external validity of this study relies substantially on reaching a critical mass of CFL files which represents abundant development platforms. Furthermore, it often does not suffice for a signature to be available-deployed signatures must be managed, distributed and kept upto-date by security administrators [16].…”
Section: Resultsmentioning
confidence: 99%
“…Based on the results from [24], and [26], and our own experiments, a set of 20 boundary symbols that provides the highest percentage of meaningful words for HTTP is defined. These 20 delimiters are:…”
Section: Methodsmentioning
confidence: 99%
“…The frequency analysis of multiple consecutive bytes, ngrams, in payload has been proposed in [25]. The use of payload words, seen as consecutive bytes separated by delimiters, to make a language model was considered in [26]. Authors of [27] take a different view and use HTTP GET request parameters and their values as the starting point for the model: the length, character positions, the structure, the values and existence of parameters are considered.…”
Section: Related Workmentioning
confidence: 99%