2002
DOI: 10.1093/oxfordjournals.molbev.a004152
|View full text |Cite
|
Sign up to set email alerts
|

Accuracy and Power of Bayes Prediction of Amino Acid Sites Under Positive Selection

Abstract: Bayes prediction quantifies uncertainty by assigning posterior probabilities. It was used to identify amino acids in a protein under recurrent diversifying selection indicated by higher nonsynonymous (d(N)) than synonymous (d(S)) substitution rates or by omega = d(N)/d(S) > 1. Parameters were estimated by maximum likelihood under a codon substitution model that assumed several classes of sites with different omega ratios. The Bayes theorem was used to calculate the posterior probabilities of each site falling … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

9
283
0
3

Year Published

2003
2003
2019
2019

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 371 publications
(295 citation statements)
references
References 25 publications
9
283
0
3
Order By: Relevance
“…This procedure guaranties that the simulated dataset has the same distribution of parameters as the real data set, including potential confounding factors such as codon usage bias or long branches. These results and those of Anisimova et al (Anisimova, Bielawski et al 2002;Anisimova and Yang 2007) showed that the maximum likelihood estimate is robust to dS saturation, even for large divergence as shown by our simulations with doubled branch lengths. This is probably due to the use of more sequences, which "break" the long branches of the gene tree.…”
Section: What Is the Effect Of Genome Duplication On The Incidence Ofsupporting
confidence: 90%
See 1 more Smart Citation
“…This procedure guaranties that the simulated dataset has the same distribution of parameters as the real data set, including potential confounding factors such as codon usage bias or long branches. These results and those of Anisimova et al (Anisimova, Bielawski et al 2002;Anisimova and Yang 2007) showed that the maximum likelihood estimate is robust to dS saturation, even for large divergence as shown by our simulations with doubled branch lengths. This is probably due to the use of more sequences, which "break" the long branches of the gene tree.…”
Section: What Is the Effect Of Genome Duplication On The Incidence Ofsupporting
confidence: 90%
“…Simulations suggest a minimum number of six sequences in the alignment to have enough power and accuracy (Anisimova, Bielawski et al 2001;Anisimova, Bielawski et al 2002;Anisimova and Yang 2007). This is in itself a major issue for genomic studies involving too few species (e.g.…”
Section: Samplingmentioning
confidence: 99%
“…Empirical results reported by Yang et al (2000a) and simulations by Anisimova et al (2001Anisimova et al ( ,2002 indicated that the LRTs and the inference of sites under positive selection do not seem to be sensitive to the assumed tree topology (a neighbor-joining tree in our analyses), even if a star tree is used. Hence, presumably, our results are not biased by whichever phylogenetic process (clonal, epidemic, or panmictic) drives the population structure of the studied pathogens.…”
Section: 23mentioning
confidence: 56%
“…This impacts Bayesian site identification, as ML parameter estimates are used to compute the posterior probabilities. Simulation studies showed that low accuracy in Bayesian site identification occurs when sequence divergence is very low or too few sequences are sampled because under such conditions the sampling errors in ML parameter estimates are too high (Anisimova et al 2002). Similarly, suboptimal parameter estimates, based on local optimum, also could lead to low accuracy in Bayesian site identification.…”
Section: Discussionmentioning
confidence: 99%