We present a method called DAVROS to detect, localize, and validate repeating motifs in protein structure allowing for insertions and deletions. DAVROS uses the score matrix from a structural alignment program (SAP) to search for repeating motifs using an algorithm based on concepts from signal processing and the statistical properties of the alignments. The method was tested against a nonredundant Protein Data Bank, and each chain was assigned a score. For the top 50 chains ranked by score, 70% contain repeating motifs detected without error. These represent 14 types of fold covering alpha, beta, and alphabeta protein classes. A second data set comprising protein chains in different sequence families for triosephosphate isomerase (TIM) barrel, leucine-rich repeat (LRR), trefoil, and alpha-alpha barrel folds was used to assess the ability of DAVROS to detect all motifs within a specific fold. For the second test set, the percentage of motifs detected was highest for the LRR chains (88.7%) and least for the TIM barrels (60%). This variability results from the regularity of the LRR motif compared to the alphabeta units of the TIM barrel, which generally have many more indels. These reduce the strength of the repeat signal in the SAP matrix, making repeat detection more difficult.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.