The problem of searching occurrences of a pattern P[0...m-1] in the text T[0…n-1] with m n, where the symbols of P and T are drawn from some alphabet of size , is called exact string matching problem. The problem of searching a set of patterns P0, P1, P2...Pr-1, r 1, in the given text T is called multi-pattern string matching problem. This problem has been previously solved by bit-parallel strings matching algorithms: shift-or and Backward non-deterministic DAWG matching (BNDM). In this paper, we extend BNDM algorithm with q-gram (B. Durian et al., 2008) for multiple patterns, where patterns are taken as "limited expressions". We define limited expression as subset of extended patterns excluding regular expression, optional and repeatable characters. Some examples are: patterns in case sensitive, patterns containing classes of characters etc. The set of r multiple patterns can be handled by converting into single pattern P by using either classes of characters or concatenating the characters of each patterns. We assume that each pattern is of equal size m and total length of pattern (after pre-processing) is less than or equal to word length (w) of computer used. We compare the performance of multi-patterns q-gram BNDM algorithm with already existing BNDM algorithm.
The problem of searching occurrences of a pattern P[0...m-1] in the text T[0…n-1] with m n, where the symbols of P and T are drawn from some alphabet of size , is called exact string matching problem. In the present day, pattern matching is a powerful tool in locating nucleotide or amino acid sequence patterns in the biological sequence database. The problem of searching a set of patterns P0, P1, P2...Pr-1, r 1, in the given text T is called multi-pattern string matching problem. The multipatterns string matching problem has been previously solved by efficient bit-parallel strings matching algorithms: shift-or and BNDM. Many other types of algorithms also exist for the same purpose, but bit-parallelism has been shown to be very efficient than the others. In this paper, we extend BNDM algorithm with q-gram (B. Durian et al., 2008) for multiple patterns, where each multi-patterns are any DNA patterns. We assume that each pattern is of equal size m and total length of pattern is less than or equal to word length (w) of computer used. Since BNDM algorithm has been shown to be faster than any other bit-parallel string matching algorithm (G. Navarro, 2000), therefore, we compare the performance of multi-patterns q-gram BNDM algorithm with existing BNDM algorithm for different value of q and number of patterns (r).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.