2014
DOI: 10.1089/cmb.2013.0098
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating, Comparing, and Interpreting Protein Domain Hierarchies

Abstract: Arranging protein domain sequences hierarchically into evolutionarily divergent subgroups is important for investigating evolutionary history, for speeding up web-based similarity searches, for identifying sequence determinants of protein function, and for genome annotation. However, whether or not a particular hierarchy is optimal is often unclear, and independently constructed hierarchies for the same domain can often differ significantly. This article describes methods for statistically evaluating specific … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2014
2014
2019
2019

Publication Types

Select...
6

Relationship

2
4

Authors

Journals

citations
Cited by 7 publications
(9 citation statements)
references
References 11 publications
0
9
0
Order By: Relevance
“…The 262,126 sequences were aligned using MAPGAPS[ 68 ], a rapid and accurate alignment procedure. This was used as an input for mcBPPS [ 69 ], a Bayesian pattern based partitioning algorithm, which identifies residue positions in the alignment that most distinguish PTK sequences (‘foreground’) from STK (‘background’ sequences). In our analysis, we excluded tyrosine kinase-like sequences (TKLs), which share some of the PTK features in the kinase core.…”
Section: Methodsmentioning
confidence: 99%
“…The 262,126 sequences were aligned using MAPGAPS[ 68 ], a rapid and accurate alignment procedure. This was used as an input for mcBPPS [ 69 ], a Bayesian pattern based partitioning algorithm, which identifies residue positions in the alignment that most distinguish PTK sequences (‘foreground’) from STK (‘background’ sequences). In our analysis, we excluded tyrosine kinase-like sequences (TKLs), which share some of the PTK features in the kinase core.…”
Section: Methodsmentioning
confidence: 99%
“…Note that the probability distribution (as defined in Eq 4 of Methods) treats each column position as statistically independent; hence BPPS models correlations indirectly, based on the hierarchy. S5 Fig illustrates how BPPS sampling produces hierarchies that, based on measures of statistical significance, are generally superior to hierarchies created at the NCBI using a combination of phylogenetic analysis and manual-curation; for in-depth evaluations of protein domain hierarchies in this way, see [23, 24, 76]. …”
Section: Resultsmentioning
confidence: 99%
“…In principle, finding the global optimum for all but the simplest hierarchies is nearly impossible. In practice, however, we find that the various hierarchies obtained from run to run generally share the biologically most important features of the superfamily [76] and that the features most difficult to model optimally are least important. Therefore, failing to find the global optimum often forfeits little of biological significance.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…In some cases, these may be because of evolutionary events that defy preconceived notions and thus are inconsistent with a single phylogenetic tree. In a companion article (Neuwald, 2013), I explore this phenomenon and other aspects of domain hierarchies in greater detail.…”
Section: Neuwaldmentioning
confidence: 99%