BackgroundMotif scanning is a very common method in bioinformatics. Its objective is to detect motifs of sufficient similarity to the query, which is then used to determine familiy membership, or structural or functional features or assignments. Considering a variety of uses, accuracy of motif scanning procedures is of great importance.
ResultsWe present a new approach for improving motif scanning accuracy, based on analysis of in-between similarity. Given a set of motifs obtained from a scanning process, we construct an associated weighted graph. We also compute the expected weight of an edge in such a graph. It turns out that restricting results to the maximal clique in the graph, computed with respect to the expected weight, greatly increases precision, hence improves accuracy of the scan. We tested the method on an ungapped motif-characterized protein family from five plant proteomes. The method was applied to three iterative motif scanners -PSI-BLAST, JackHMMer and IGLOSS -with very good results.
ConclusionsWe presented a method for improving protein motif scanning accuracy, and have successfully applied it in several situations. The method has wider implications, for general pattern recognition and feature extraction strategies, as long as one can determine the expected similarity between objects under consideration.