A technique is described for implementing the test which determines if one string is a substring of another. When there is low probability that the test will be satisfied, it is shown how the operation can be speeded up considerably if it is preceded by a test on appropriately chosen hash codes of the strings. . This work was done under AEC contract number AT (30-1) 1480 V.This note describes a fast implementation of the test which determines if one string contains a specified substring. The scheme makes use of hashing techniques and of the ability to do many Boolean operations in parallel on a standard computer. It is most useful when the same strings are being tested repeatedly, and when the probability of finding the substring is small.The scheme makes use of three ideas. The first is that if the search is likely to be unsuccessful, it can usefully be preceded by a computationally faster test for necessary but not sufficient conditions that the substring be found. A simple example of such a test is the comparison of the lengths of the strings, but this is usually too weak a test to be useful.The second is that a string can be represented by the set of its substrings, and in particular by the set of its substrings of a specified length. In general such a representation is not unique, but it does preserve the substring property in the sense that, if one string has another string as a substring, the set of substrings of the first will include the set of substrings of the second. Because of the lack of uniqueness, the reverse is not true, of course.The third is that a set S can be represented by a binary string blb2b3 ... bm in which a value of 1 for b~ indicates that S contains at least one element of the set E~. In general such a representation is not unique, unless each E~ contains exactly one element and each possible element is contained in some E~. However, it preserves the subset property in the sense that, if set $1 is a subset of set $2, the binary string representing $2 will have ones in all positions where the string representing $1 has ones. Accordingly, consider a string S to be represented by a binary string blb2b3 • • • bm constructed as follows:
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.