2018
DOI: 10.1101/460121
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Prevalence and Impact of Model Violations in Phylogenetics Analysis

Abstract: In phylogenetic inference we commonly use models of substitution which assume that sequence evolution is stationary, reversible and homogeneous (SRH). Although the use of such models is often criticized, the extent of SRH violations and their effects on phylogenetic inference of tree topologies and edge lengths are not well understood. Here, we introduce and apply the maximal matched-pairs tests of homogeneity to assess the scale and impact of SRH model violations on 3,572 partitions from 35 published phylogen… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
3
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 115 publications
0
3
0
Order By: Relevance
“…I + G excluded from model selection since these parameters are not independent of each other [70,71]). Five partitioned maximum-likelihood (ML) analyses [72] were then performed in IQ-Tree for each dataset with the following settings: keep identical sequences (--keepidentical), remove partitions violating stationarity and homogeneity assumptions (--symtest-removebad; [73]), 1000 ultrafast bootstrap replicates further optimized by nearest neighbour interchange based on bootstrap alignments (-B 1000 -bnni; [74]) and 1000 Shimodaira-Hasegawa-like approximate likelihood ratio test replicates (-alrt 1000; [75]). The phylogeny with the best log-likelihood was then selected for each dataset.…”
Section: Phylogenetic Inferencementioning
confidence: 99%
“…I + G excluded from model selection since these parameters are not independent of each other [70,71]). Five partitioned maximum-likelihood (ML) analyses [72] were then performed in IQ-Tree for each dataset with the following settings: keep identical sequences (--keepidentical), remove partitions violating stationarity and homogeneity assumptions (--symtest-removebad; [73]), 1000 ultrafast bootstrap replicates further optimized by nearest neighbour interchange based on bootstrap alignments (-B 1000 -bnni; [74]) and 1000 Shimodaira-Hasegawa-like approximate likelihood ratio test replicates (-alrt 1000; [75]). The phylogeny with the best log-likelihood was then selected for each dataset.…”
Section: Phylogenetic Inferencementioning
confidence: 99%
“…Model violations are a sneaky problem that cannot be fixed by adding more data (as opposed to the stochastic errors from uneven sampling discussed before), but it can even be exacerbated by adding more loci, leading to faulty inferences with strong statistical support [64,65]. The assumptions that commonly used substitution models make about the evolutionary process at each site are frequently violated across the genome [66,67]. In birds, this is likely also the case because GC content varies considerably between avian lineages [10,68].…”
Section: Incongruence As a Sign Of Data Problemsmentioning
confidence: 99%
“…Tests that rely on site pattern binning also result in a loss of information necessary to detect departures from time-reversibility of the substitution process in different lineages. Such departures can be visualized by Bowker's test of symmetry (Bowker 1948;reviewed in Jermiin et al 2017 andNaser-Khdour et al 2019) that checks the null hypothesis of equality of occurrences of the forward and reverse substitutions in a pairwise comparison of sequences (see Supplementary Appendix S1 available on Dryad at http://dx.doi.org/10.5061/dryad.4f4qrfjc8). Considering the above null hypothesis here helps to highlight deviations from time-reversible assumptions concerning forward and reverse substitutions among sequences.…”
mentioning
confidence: 99%