2008
DOI: 10.1093/bioinformatics/btn445
|View full text |Cite
|
Sign up to set email alerts
|

Empirical profile mixture models for phylogenetic reconstruction

Abstract: In this work, we introduce an expectation-maximization algorithm for estimating amino acid profile mixtures from alignment databases. We apply it, learning on the HSSP database, and observe that a set of 20 profiles is enough to provide a better statistical fit than currently available empirical matrices (WAG, JTT), in particular on saturated data.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
293
0

Year Published

2010
2010
2022
2022

Publication Types

Select...
6
2

Relationship

2
6

Authors

Journals

citations
Cited by 305 publications
(303 citation statements)
references
References 40 publications
2
293
0
Order By: Relevance
“…However, because the 20, 40, or 60 profiles were obtained from a framework operating at the amino acid level only (i.e., that does not attempt to tease out mutational and selective effects), the Bayes factors associated with their use in the mutation-selection framework are likely to be lower than they might be with empirical mixtures obtained by incorporating the nucleotide (codon)-level data within the training methods described in ref. 12. Despite this weakness of the plug-in we attempt here, the natural log Bayes factors are ∼236, 255, and 269 for the MGMutSelC20, MG-MutSelC40, and MG-MutSelC60 models, respectively (computed against the MG model).…”
Section: Resultsmentioning
confidence: 93%
See 2 more Smart Citations
“…However, because the 20, 40, or 60 profiles were obtained from a framework operating at the amino acid level only (i.e., that does not attempt to tease out mutational and selective effects), the Bayes factors associated with their use in the mutation-selection framework are likely to be lower than they might be with empirical mixtures obtained by incorporating the nucleotide (codon)-level data within the training methods described in ref. 12. Despite this weakness of the plug-in we attempt here, the natural log Bayes factors are ∼236, 255, and 269 for the MGMutSelC20, MG-MutSelC40, and MG-MutSelC60 models, respectively (computed against the MG model).…”
Section: Resultsmentioning
confidence: 93%
“…For these empirical versions, the mixture model is predefined, in terms of the number of components and in terms of the profiles of each component, which are fixed to those inferred in ref. 12; only the weights of the components are free parameters, endowed with a flat Dirichlet prior. Depending on the number of components, we refer to these models as MG-MutSelC20, MG-MutSelC40, and MG-MutSelC60.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, a series of empirical observations suggest that finite mixtures are not sufficient. First, the empirical fit shows a substantial increase with the number of components even when this number is high [28,32], indicating that even relatively rich mixtures do not capture a sufficient fraction of the available variation. Forcing the number of categories to a relatively small number is possible.…”
Section: (C) Modelling Variation Across Sitesmentioning
confidence: 99%
“…The substitution model employed was LG+Γ, branch support was assessed by the rapid bootstrapping algorithm that is an inherent part of the best tree search strategy of RAxML. To test the robustness of the tree topologies we also employed ML inference using the program PhyML-CAT and the empirical profile mixture model C20 [35] with gamma correction (four categories) of the among-site rate heterogeneity; Chi2-based parametric branch support was calculated using the approximate likelihood ratio test implemented in PhyML (-b -2 option). Trees were visualized using iTOL (http://itol.embl.de/ [36]) and rendered for publication using a graphical editor.…”
Section: Alignment and Phylogenetic Analysesmentioning
confidence: 99%