“…A contiguous subsequence contained in D is called a shingle [9,10]. Given a document D, we can associate to it, its w-shingling defined as the bag (multiset) of all shingles of size w contained in D. So for instance the 4-shingling of(a,rose,is,a,rose,is,a,rose) is the bag{(a,rose,is,a); (rose,is,a,rose); (is,a,rose,is);(a,rose,is,a); (rose,is,a,rose)}.…”