Abstract-Pattern-matching techniques have recently been applied to network security applications such as intrusion detection, virus protection, and spam filters. The widely used Aho-Corasick (AC) algorithm can simultaneously match multiple patterns while providing a worst-case performance guarantee. However, as transmission technologies improve, the AC algorithm cannot keep up with transmission speeds in high-speed networks. Moreover, it may require a huge amount of space to store a two-dimensional state transition table when the total length of patterns is large. In this paper, we present a pattern-matching architecture consisting of a stateful pre-filter and an AC-based verification engine. The stateful pre-filter is optimal in the sense that it is equivalent to utilizing all previous query results. In addition, the filter can be easily realized with bitmaps and simple bitwise-AND and shift operations. The size of the two-dimensional state transition table in our proposed architecture is proportional to the number of patterns, as opposed to the total length of patterns in previous designs. Our proposed architecture achieves a significant improvement in both throughput performance and memory usage.Index Terms-Aho-Corasick (AC) algorithm, Bloom filter, deep packet inspection, pattern matching.
Abstract-Because of its accuracy, signature matching is considered an important technique in anti-virus/worm applications. Among some famous pattern matching algorithms, the Aho-Corasick (AC) algorithm can match multiple patterns simultaneously and guarantee deterministic performance under all circumstances and thus is widely adopted in various systems, especially when worst-case performance such as wire speed requirement is a design factor. However, the AC algorithm was developed only for strings while virus/worm signatures could be specified by simple regular expressions. In this paper, we generalize the AC algorithm to systematically construct a finite state pattern matching machine which can indicate the ending position in a finite input string for the first occurrence of virus/worm signatures that are specified by strings or simple regular expressions. The regular expressions studied in this paper may contain the following operators: * (match any number of symbols), ? (match any symbol), and {min, max} (match minimum of min, maximum of max symbols), which are defined in ClamAV, a popular open source anti-virus/worm software module, for signature specification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.