2007
DOI: 10.1098/rsif.2007.1047
|View full text |Cite
|
Sign up to set email alerts
|

Practical and theoretical advances in predicting the function of a protein by its phylogenetic distribution

Abstract: The gap between the amount of genome information released by genome sequencing projects and our knowledge about the proteins' functions is rapidly increasing. To fill this gap, various 'genomic-context' methods have been proposed that exploit sequenced genomes to predict the functions of the encoded proteins. One class of methods, phylogenetic profiling, predicts protein function by correlating the phylogenetic distribution of genes with that of other genes or phenotypic characteristics. The functions of a num… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
96
0

Year Published

2010
2010
2017
2017

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 84 publications
(98 citation statements)
references
References 168 publications
1
96
0
Order By: Relevance
“…First, the phyletic profiles (PP) method represents the COG/NOG gene families (OGs; see below) by the presence/absence patterns of their member genes across 2,071 genomes, and then makes inferences about gene functions by comparing such patterns via pairwise similarity ( Fig. S1a; Pellegrini et al, 1999;Kensche et al, 2008;de Vienne and Azé, 2012) or by machine learning (Tian et al, 2008;Škunca et al, 2013). Second, biophysical and protein sequence properties (BPS) method includes 1,170 features representing amino acid composition, particular motifs or periodicities (King et al, 2001;Jensen et al, 2003;Lanckriet et al, 2004;Minneci et al, 2013) and various sequence statistics (summary in Sec.…”
Section: Representing Gene Families Using Diverse Sets Of Genomic Feamentioning
confidence: 99%
“…First, the phyletic profiles (PP) method represents the COG/NOG gene families (OGs; see below) by the presence/absence patterns of their member genes across 2,071 genomes, and then makes inferences about gene functions by comparing such patterns via pairwise similarity ( Fig. S1a; Pellegrini et al, 1999;Kensche et al, 2008;de Vienne and Azé, 2012) or by machine learning (Tian et al, 2008;Škunca et al, 2013). Second, biophysical and protein sequence properties (BPS) method includes 1,170 features representing amino acid composition, particular motifs or periodicities (King et al, 2001;Jensen et al, 2003;Lanckriet et al, 2004;Minneci et al, 2013) and various sequence statistics (summary in Sec.…”
Section: Representing Gene Families Using Diverse Sets Of Genomic Feamentioning
confidence: 99%
“…Thus, revealing sets of proteins coherently appeared in different organisms may facilitate the search of functional modules in genome structure. Recently there were many attempts to reveal modular structure in genome data sets using different blind statistical methods, such as cluster analysis, independent component analysis and others (see [19] for review). Since the concept of genome functional modularity is completely compatible with BFA generative model described here, it was a challenge for us to apply BFA based methods to reveal hidden factor structure in some large genome data set.…”
Section: Application To Genome Data Set Analysismentioning
confidence: 99%
“…Galperin & Koonin, 2000;Moyle et al, 1994). Coevolution of proteins may be assessed at sequence level (sequence co-evolution) by correlating evolutionary rates (Clark et al, 2011), or at gene family level (gene family evolution) by correlating occurrence vectors (Kensche et al, 2008). An occurrence vector or a phylogenetic profile (phyletic pattern) (Tatusov et al, 1997) is an encoding of protein's (homologue's) presence or absence within a given set of species of interest (Kensche et al, 2008).…”
Section: Evolution and Protein-protein Interactionmentioning
confidence: 99%
“…Coevolution of proteins may be assessed at sequence level (sequence co-evolution) by correlating evolutionary rates (Clark et al, 2011), or at gene family level (gene family evolution) by correlating occurrence vectors (Kensche et al, 2008). An occurrence vector or a phylogenetic profile (phyletic pattern) (Tatusov et al, 1997) is an encoding of protein's (homologue's) presence or absence within a given set of species of interest (Kensche et al, 2008). In general, the methods for correlating protein evolution have been successfully applied to predict a physical or functional interaction between proteins (Clark et al, 2011;Kensche et al, 2008), where sequence co-evolution is powerful in predicting the physical interaction and phylogenetic profiling is a good indicator of functional interplay between proteins in a broader sense.…”
Section: Evolution and Protein-protein Interactionmentioning
confidence: 99%
See 1 more Smart Citation