1996
DOI: 10.1002/pro.5560050703
|View full text |Cite
|
Sign up to set email alerts
|

Multiple domain protein diagnostic patterns

Abstract: We have implemented an iterative algorithm for the identification of diagnostic patterns from sets of multipledomain proteins, where domains need not be common to all the proteins in the defining set. Our algorithm was applied to sequences gathered using a variety of methods, including BLAST, common keywords, and common E.C. numbers. In all cases, useful diagnostic patterns were obtained, possessing both high sensitivity and specificity. The patterns were found to correlate in several cases with both functiona… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
7
0

Year Published

1997
1997
2003
2003

Publication Types

Select...
8
1

Relationship

3
6

Authors

Journals

citations
Cited by 17 publications
(7 citation statements)
references
References 26 publications
0
7
0
Order By: Relevance
“…To systematically clean up the annotation mistakes already in the databases and prevent propagation of annotation errors, we must thoroughly examine the sequences in the databases, dis¬ sect all the identifiable protein domains, construct protein domain profiles, and carefully annotate the do¬ mains in a way that is consistent with all the domain members. We and others have started such efforts (Adams et al, 1996;Henikoff and Henikoff, 1991;Sonnhammer et al, 1997). Only after carefully anno¬ tated protein domain libraries are constructed will all the transitive annotation mistakes be identified.…”
Section: Discussionmentioning
confidence: 99%
“…To systematically clean up the annotation mistakes already in the databases and prevent propagation of annotation errors, we must thoroughly examine the sequences in the databases, dis¬ sect all the identifiable protein domains, construct protein domain profiles, and carefully annotate the do¬ mains in a way that is consistent with all the domain members. We and others have started such efforts (Adams et al, 1996;Henikoff and Henikoff, 1991;Sonnhammer et al, 1997). Only after carefully anno¬ tated protein domain libraries are constructed will all the transitive annotation mistakes be identified.…”
Section: Discussionmentioning
confidence: 99%
“…These features are all categorically-valued. Among others, they include designations for a polymorphism that introduces a charged amino acid at a residue position that is inaccessible to solvent in the model structure (buried charge), 12 polymorphisms that substitute a glycine or a proline amino acid in a region of helical secondary structure in the model (helix breaking), polymorphisms that occur at the conserved glycine or proline in a turn (turn breaking), and polymorphisms that introduce an amino acid that is not represented in the phylogenetic pro®le of the polymorphic residue (unusual amino acid) or its amino acid class (unusual amino acid by class) 46 (Table 1B). Amino acid substitution matrices (e.g.…”
Section: The Predictive Featuresmentioning
confidence: 99%
“…Recently, several methods for predicting domain boundaries from amino acid sequence have been proposed on the basis of a multiple sequence alignment (Park and Teichmann 1988; Sonnhammer and Kahn 1994; Adams et al 1996; Gracy and Argos 1998; Guan and Du 1998; Gouzy et al 1999; George and Heringa 2002) and on statistically derived distributions of domain lengths (Wheelan et al 2000). However, these methods can only be successful at identifying domains if the sequence has detectable similarity to other sequence fragments in databases or when the length of the unknown domains does not substantially deviate from the average of known protein structures.…”
mentioning
confidence: 99%